About

Aim & Scope


The TP53 Database compiles TP53 variant data that have been reported in the published literature since 1989 or are available in other public databases. Database releases are identified by a number.
The current is R20 (July 2019). The following data are available:
The TP53 Database is meant to be a source of information on TP53 variants for a broad range of scientists and clinicians who work in different research areas:
  • Basic research, to study the structural and functional aspects of the p53 protein and the TP53 gene
  • Molecular pathology of cancer, to understand the clinical significance of TP53 variants identified in cancer patients
  • Molecular epidemiology of cancer, to analyze the links between specific exposures and TP53 variant patterns in order to make inferences about possible causes of cancer
  • Molecular genetics, to analyze genotype/phenotype relationships
The database includes various annotations on the predicted or experimentally assessed functional impact of TP53 variants, clinicopathologic characteristics of tumors and demographic and life-style information on patients. This information is useful to compile tumor-specific variant patterns and to draw hypotheses on the nature of the molecular events involved in TP53 mutagenesis and allows for the analysis of genotype/phenotype relationships.
Detailed information on data and annotations available is provided in the User Manual.
The ongoing project involves:
  • Performing regular review of the literature on TP53 variants
  • Extracting TP53 data from genetic and genomic databases
  • Developing standard annotations of TP53 variants
  • Performing research on TP53 variants, their patterns, origins and clinical impacts.

Database Development


  • This R20 release compiles data on around 29,900 tumor variants, 9,200 variants reported in SNP databases, 1,530 cancer families/individual carriers of a germline variant, 2,700 cell-lines, 900 experimentally induced variants, and functional data on over 9,000 mutant proteins. Variant descriptions are provided on both hg19 and hg38 genome builds.
  • Variant descriptions have been revised to better comply with HGVS nomenclature.
  • The dataset of germline variants (germline variants in cancer patients/families) has been updated with data published between June 2018 and June 2019.
  • A new dataset including NGS studies reporting the frequency of individual TP53 germline variants in case-control series is provided.
  • A large dataset on the functional impact of over 8200 p53 mutant proteins (Giacomelli et al., 2018) has been curated to add a new classification reflecting loss of function and dominant-negative effects (DNE_LOF class). A link to a website providing interactive access to these data is provided.
  • Four papers on funtional activities of mutant proteins have been curated.
  • Data on polymorphisms (variants frequent in healthy human populations) have been curated to include most recent data from dbSNP152 and gnomAD databases. Allelic frequencies have been retrieved from these databases to classify variants as "validated polymorphisms" if they are found at MAF>=0.001.
  • The dataset of tumor variants is no longer updated as most new data are available in several other data portals (cBioportal, GENIE, COSMIC, ICGC). Instead, tumor variant counts from TCGA, ICGC and GENIE datasets for each individual variant is provided.
  • For data on TP53 status in cell-lines, a new link to the depmap resource has been created.
  • Variant data from ClinVar, June 2019 release, were curated.
  • Links to COSMIC, ClinVar, gnomAD and dbSNP have been updated.
  • The dataset of induced variants is unchanged as no new data were found.

  • This R19 release compiles data on over 29000 somatic variants, 8000 variants reported in SNP databases, 1200 germline variants related to Li-Fraumeni syndrome, 2700 cell-lines, 900 experimentally induced variants, and functional data on over 4400 mutant proteins.
  • Variant descriptions are now provided on both hg19 and hg38 genome builds.
  • The dataset of germline variants (rare disease-causing variants) has been updated with data published between January 2016 and June 2018. The dataset has increased by 30%!
  • The dataset of functional impact of p53 mutant proteins has been updated with 12 studies, including one major study that analyzed over 9500 DNA-binding domain variants for their growth suppression activity (see Kotler et al.,).
  • Data on polymorphisms (variants frequent in healthy human population) have been curated to include most recent data from dbSNP151, Flossie, gnomAD, 1000G and ESP6500 databases for the full TP53 gene sequence, including 5'UTR and 3'UTR regions (hg38, chr17:7661725-7689853). Allelic frequencies have been retrieved from these databases to classify variants as "validated polymorphisms" (MAF > 0.001 in at least one of these databases).
  • New annotations have been added on the predicted functional impact of variants by REVEL, BayesDel and an optimized Align-GVGD algorithm (see Fortuno et al.,).
  • Direct links to individual variants in other databases have been added: ClinVar, COSMIC, gnomAD.
  • The dataset of tumor variants has not been updated as most new data are captured in other databases (cBioportal, COSMIC, TCGA and ICGC data portals). Instead we added tumor variant counts from cBioportal for each individual variant.
  • The dataset of induced variants has not been changed because no new data were found.
  • Data on TP53 status in cell-lines have not been updated.

  • This R18 release compiles data on over 29,000 tumor variants, 891 germline variants, 2700 cell-lines, functional data on 2314 mutant proteins, and over 900 experimentally induced variants (related to exposure to 17 different carcinogens).
  • The dataset of germline variants (rare disease causing variants) has been updated with data published between 2013 and 2015.
  • The dataset of tumor variants has been updated with one TCGA study on glioblastoma.
  • The dataset of induced variants has been updated with 6 studies.
  • Data on polymorphisms (neutral variants frequent in healthy human population) in dbSNP, ESP and 1000G have been reviewed to update variant classification.
  • Data on TP53 status in cell-lines have been updated with recent CCLE data.
  • Variant descriptions are based on hg19 reference sequence but chromosome coordinates for genome builds hg18 and hg38 are also provided in downloaded files.
  • New annotations have been added: functional impact of variant predicted by Polyphen2, trinucleotide sequence context of variants.
  • The web interface has been updated to fix some issues with Jmol (Jmol was upgraded to JSmol to solve compatibility issues with some browsers) and to allow the input of list of variations for querying all datasets. Some pages were also redesigned to improve navigation.

  • This R17 release compiles data on over 28,000 tumor variants, 750 germline variants, 2700 cell-lines, functional data on 2314 mutant proteins, and over 700 experimentally induced variants (from exposure with 13 different carcinogens).
  • A new dataset on experimentally induced variants has been added. It compiles variants obtained in mutagenicity assays using the human TP53 gene (Hupki MEF and yeast assays).
  • The dataset of germline variants has been updated with data published between November July 2012 and July 2013. Data on the prevalence of germline variants have been fully integrated in the database scheme and made searchable on the web interactive search.
  • The dataset of tumor variants has been updated with selected studies (ovarian cancer and adrenocortical carcinoma).
  • The dataset of functional impact of p53 mutant proteins has been updated with one study reporting DNE activity of 100 different mutants in a yeast assay (Monti et al., 2011).
  • The genome build hg19 is now used as default for describing variants at the genome level.
  • The web interface has been updated to reflects the changes in database contents and annotations described above.

  • This R16 release compiles the occurrence of 29575 tumor variants, 635 germline variants, functional data on 2314 mutant proteins and TP53 gene status of 2708 cell-lines.
  • The dataset of germline variants has been updated with data published between November 2010 and July 2012.
  • The dataset of cell-lines has been updated with report published in 2010 and with data from the Cancer Cell Line Encyclopedia of the Broad Institute.
  • The datasets of functional impact of p53 mutant proteins and of tumor variants (and associated Prognosis and Prevalence datasets) have been partially updated. For tumor variants, data from recent large scale or whole genome sequencing studies have been curated (with the collaboration of COSMIC), as well as studies from IARC and data on rare cancers or cancers for which variant data are not well represented in the database (such as upper urinary tract, kidney or head and neck cancers). Studies published between 2008 and 2011 have been included.
  • The web interface has been upgraded to ASP.NET for better functionalities and interactivities. New search options are available, including the full analysis of germline variant dataset, the possibility to input a list of variations, and to retrieve full annotations on functional/structural impacts of variations and their frequency of occurrence.

  • This R15 release contains 27580 tumor variants, 597 germline variants, functional data on 2314 mutant proteins and TP53 gene status of 2263 cell-lines.
  • The dataset of Tumor variants (and associated Prognosis and Prevalence datasets) has been partially updated. Data from recent large scale or whole genome sequencing studies have been curated (with the collaboration of COSMIC) as well as data on liver and breast cancers published in 2008 and 2009.
  • The dataset of germline variants has been updated with data published between September 2009 and October 2010.
  • The dataset of cell-lines has been updated with report published in 2008 and 2009 and with data from COSMIC Cell-line database. An extensive review of COSMIC data has been performed as we noticed that several cell_lines linked to Ref_ID 2056 (COSMIC database) were wrongly annotated as WT in previous releases. These entries were in fact cells not yet analyzed for TP53. This mistake has been corrected in the current version.
  • The datasets of functional activities of p53 mutant proteins and of mouse-models have not been updated.
  • A new set of data is available, the TP53_R249S dataset, that provides data on the prevalence of the p.R249S variant in liver cancers. It includes studies that have screened exon 7 of TP53 by sequencing as well as studies that have searched for this specific variant by RFLP. These data are also included in the Prevalence dataset. Data from studies that have anaylzed only codon 249 are not included in the tumor variant dataset. The presence of this variant in hepatocellular carcinomas has been linked to exposure to aflatoxins and HBV, and may thus constitutes a biomarker of exposure. This dataset can be downloaded from the Dataset Downloads page.

  • Up to now, the IARC TP53 Database policy was to include all published papers reporting data on TP53 variants, even if poor quality or artefactual data were suspected. The idea was to reflect literature data. However, due to increasing possible miss-use of the database (more non-specialists users), we decided to change the database policy.
    Thus, starting with this R15 release of the database (November 2010), the following apply to the dataset of tumor variants:

    • New papers of poor quality have not been curated;
    • Papers of poor quality previously included in the database are excluded from the online search results. These papers will remain available through the "Dataset Downloads" option and will be identified by an annotation field called "Exclude analysis". Twenty one papers (accounting for 972 variants) have been identified as poor quality.

           Poor data quality is suspected when:

    • Several unusual variants, especially silent or functional (based on transactivation assays) ones, are described;
    • More than 3 variants are found by sample;
    • The same variant is found in several samples of the same series, especially unusual variants or silent or functional variants;
    • If the conditions above are in the context of studies that did not confirm variants in independent PCR products and/or used nested PCR.

  • The dataset of tumor variants has been updated with data reported in publications edited in PubMed in 2007 and the dataset of germline variants has been updated with data reported in publications edited between August 2008 and August 2009. The datasets of functional activities of p53 mutant proteins and of mouse-models have not been updated. This R14 release contains 26597 somatic variants, 535 germline variants, functional data on 2314 mutant proteins and TP53 gene status of 1993 cell-lines.
  • New annotations have been generated on the predicted impact of variations on the status of p53 protein isoforms (see p53 Isoforms Predictions for details). These annotations can be retrieved from the 'Variant Validation' search option.
  • Statistics on the prevalence of TP53 germline variants in selected cohorts has been added.

  • A new reference sequence is used for TP53 gene. This sequence is based on the most recent version of the genome assembly. Its GenBank reference is NC_000017 (7512445..7531642). Sequences of the introns are more complete and acurate than in refseq X54156. The only change in exons is at the polymorphic site in codon 72, where a C is present in the new refseq, while a G was present in refseq X54156. Note that some variants have been reannotated in light of the new sequence.
  • The database has been updated with data reported in publications edited in PubMed in 2007. All datasets have been updated except the dataset of somatic variants. This R13 release contains 24785 tumor variants, 423 germline variants, functional data on 2314 mutant proteins and TP53 gene status of 1894 cell-lines.
  • New annotations on the predicted impact of variations on splicing have been generated with available algorithms (see details in the Help section).
  • Information on polymorphism frequency has been added in the dataset of polymorphisms.
  • New annotations were introduced for the description of variations to comply with HGVS standards.
  • All missense variants reported in the IARC TP53 database have been submitted to SwissProt. Links between the two databases are thus now available.

  • The database has been updated with data reported in publications edited in PubMed in 2006. This R12 release contains 24810 tumor variants, 399 germline variants, functional data on 2314 mutant proteins and TP53 gene status of 1886 cell-lines.
  • A new dataset on mouse-models with engineered p53 has been added. There are 545 models listed in R12.
  • In the Function dataset 1, data on endogenous mutants have been added and an "assay design" has been assigned to each experiment to emphasize the quality of the data.
  • Data on protein stability of some mutants have been added.

  • The database has been updated with data reported in publications edited in PubMed in 2005. This R11 release contains 23,544 tumor variants, 376 germline variants, functional data on 2314 mutant proteins and TP53 gene status of 1569 cell-lines.
  • The dataset on TP53 gene status in cell-lines has been greetly extended and a search tool has been developed to easily retrieve data.
  • The list of polymorphisms has been extended and updated with data from SNP databases, and links to other databases providing additional information on population frequency and disease associations have also been included.
  • A new classification for the transactivation capacity of mutant proteins has been generated from the original data by Kato et al. (2003)
  • New tools and information have been addded in the "TP53 resources" section, including a viewer of the genomic sequence of TP53 with exon-intron boundaries and highlighted polymorphic sites, and a list of p53 response-elements with their sequence and location.

  • Annotations on the functional impact (predicted or observed in experimental assays) of variations have been added, as well as nucleotide substitution rates for each specific variant. New interactive scatter plot graphs can be drawn to display this information.
  • The dataset on the transactivation activities of 2314 missense mutants on 8 p53-target genes generated by Kato et al. (2003) has been integrated in the database and can be downloaded with the variant dataset.
  • A variant validation tool has been implemented to check if a variant is listed in the database and what is its functional impact.
  • A summary view of the properties of specific variants can be obtained from the scatter plot graphs or from the variant validation tool.
  • A search option has been added to analyze tumor spectrum associated with specific germline variants.

  • The database has been updated with data reported in publications edited in PubMed in 2004. This R10 release contains 21,587 tumor variants, 283 germline variants and functional data on 426 mutant proteins.
  • All datasets can now be searched through SRS, a search system that allows to perform multiple user-defined queries and data retrieval in table or html formats.

  • The database has been updated with data reported in publications edited in PubMed between March 2003 and December 2003. This R9 release contains 19,809 somatic variants, 264 germline variants and functional data on 423 mutant proteins.
  • A tool has been developed to search and display data and annotations on the functional properties of mutants.
  • A tool has been developed to visualized the 3D structure of the DNA-binding domain of p53 and color specific codons.
  • A list of cell-lines for which the TP53 gene status is known can be downloaded as a tab-delimited text file.
  • The different datasets previously maintained separately in different MS Access databases have been integrated in one single relational database maintained in MS SQL server centered on the the gene variation module, allowing a better consistency in the description of variations across datasets.
  • Duplicates/errors have been identified and removed from the somatic dataset (40 variants).

  • The database has been updated with data reported in publications edited in PubMed between July 2002 and February 2003. This R8 release contains 18,585 somatic variants, 225 germline variants and functional data on 206 mutant proteins.
  • A new dataset has been created that provide annotations on the functional impact of variations. It includes p53 mutants that have been tested in functional assays in yeast or human cells. This dataset can be downloaded as a tab-delimited text file.
  • Duplicates/errors have been identified and removed from the somatic dataset (28 variants).

  • The database has been updated with data reported in publications edited in PubMed between July 2001 and June 2002. This R7 release contains 17,689 somatic variants and 225 germline variants.
  • A new dataset has been created that provides information on the prognostic value of TP53 gene variants. It includes a list of studies that have investigated the prognostic value of TP53 gene variant status in specific cancers. Study design and main findings are indicated. This dataset can be downloaded as a tab-delimited text file.
  • The dataset on the prevalence of TP53 gene variants has been extended and made available as an Access file.
  • Duplicates/errors have been identified and removed from the somatic dataset (73 variants).

  • The database has been updated with data reported in publications edited in PubMed between January 2001 and June 2001. This R6 release contains 16,285 somatic variants, 213 germline variants.
  • New annotations have been added to search individuals by population group, and variants by genomic nucleotide position.
  • Duplicates have been identified and removed from the somatic dataset (385 variants). Constant efforts are made to check for errors and track these duplicates. Despite these efforts, some may remain in the database and their identification is an ongoing task.

  • The database has been updated with data reported in publications edited in PubMed between May and December 2000. This R5 release contains 15,121 somatic variants and 196 germline variants.
  • A web application has been created to search and analyze the dataset of somatic variants.
  • A new dataset has been created that provides data on the prevalence of TP53 gene variants. Tumor site and sample numbers are indicated. This dataset can be downloaded as a tab-delimited text file.
  • A slide-show on TP53 variants in human cancer is freely available as a downloadable ppt file.
  • Duplicates have been identified and removed from the somatic dataset (388 variants). Constant efforts are made to check for errors and track these duplicates. Despite these efforts, some may remain in the database and their identification is an ongoing task.

  • The database has been updated with data reported in publications edited in PubMed between January 1998 and April 2000. This R4 release contains 14,000 somatic variants and 144 germline variants.
  • The datasets of somatic and germline variants can be downloaded as tab-delimited text files.
  • A new database has been created in MS Access with a relational scheme to maintain data on germline TP53 variants and Li-Fraumeni and Li-Fraumeni-Like syndromes.

How to Cite


When using the database, authors should cite the following source:
The TP53 Database (R20, July 2019): https://tp53.cancer.gov
and refer to de Andrade K.C. et al. in the bibliography as below:
de Andrade, K.C., Lee, E.E., Tookmanian, E.M. et al. The TP53 Database: transition from the International Agency for Research on Cancer to the US National Cancer Institute. Cell Death Differ 29, 1071–1073 (2022). https://doi.org/10.1038/s41418-022-00976-3

Credits


This current website and database are being maintained and hosted by the Division of Cancer Epidemiology and Genetics at the NCI, which is a component of the NCI Cancer Research Data Commons. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services.
The original IARC TP53 database was initiated by Monica Hollstein and Curt C. Harris in 1991 and further developed and maintained by Pierre Hainaut and Magali Olivier.
Curators: Magali Olivier (2000-2020), Caroline Gaud (2018), Audrey Petitjean (2007-2012).
Current and past collaborators on the database project are:

Disclaimer


The TP53 Database is now being maintained and hosted by NCI CBIIT Development, and has been funded in whole or in part with Federal funds from the National Cancer Institute (NCI), National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C and ID/IQ Agreement No. 17X146 under Contract No. HHSN261201500003I.
The TP53 Database is a free service offered to the scientific community.
The original IARC TP53 database was maintained by WHO's International Agency for Research on Cancer, Lyon, France, and that responsibility has now been fully transferred to NCI. The data contained herein may be freely used, downloaded and reproduced, but are not for sale or for use in conjunction with commercial or promotional purposes, and any use shall be subject to an appropriate acknowledgement of the source.
The data contained herein are provided "as is" and NCI makes no representations or warranties, either expressed or implied, as to their accuracy, completeness or suitability for a particular purpose. Similarly, NCI makes no representations or warranties with regard to the non-infringement of third-party proprietary rights. Thus, NCI does not accept any responsibility or liability with regard to the reliance on, and/or use of, such data.
Assertions about the phenotypic effects of variants are provided by multiple sources, have different levels of experimental support, and may conflict. NCI does not independently verify assertions and cannot endorse their accuracy. Information obtained through this resource is not a substitute for professional genetic counseling and is not intended for use as the basis of medical decision making.

Contact


When reporting an issue, it is important that you supply us with detailed information.
Provide the type of browser you are using and the steps to recreate the issue.