Following are some resources for interpretation of human genome variation; this is the list I submitted to Nature for the Commentary. A more comprehensive list is compiled by Rania Horaitis and available on the Human Genome Variation Society (HGVS) website.
dbSNP
http://www.ncbi.nlm.nih.gov/SNP/
Maintained by the National Center for Biotechnology Information (NCBI), this is a central repository for short nucleotide substitution, deletion, and insertion polymorphisms. The database has very little phenotypic information, but it has some links to OMIM, and NCBI is planning to add the ability to permit users to submit annotations.
There are no restrictions on use of dbSNP.
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 2001 dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29:308-11. doi:10.1093/nar/29.1.308
OMIM, Online Mendelian Inheritance in Man
http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM
This catalog of human genes and genetic disorders is authored and edited by Victor McKusick and colleagues. Its 18,000 entries are short essays summarizing the literature related to the disease or gene. Mutations and phenotypes are often textually described.
OMIM cannot be used commercially or redistributed without a license.
Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD). World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/
dbGaP
http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap
The NCBI database of genotype and phenotype provides results from genome-wide association studies and other analyses. dbGaP provides two levels of access, open and controlled. Often this allows everyone to see the overall associations, while authorization is needed for access to personal health information. The European Bioinformatics Institute (EBI) is constructing a similar resource, known as the European Genotype Archive
The open data in dbGaP are freely available and no data are protected by patents, while controlled data are restricted in use for privacy reasons.
SNPedia
http://www.snpedia.com
This is a fledgling wikipedia-style uncurated effort to describe the functional consequences of SNPs. Open to anyone to provide information, it now has 1,713 described SNPs. Pages have Google ads, though these are not expected to cover costs.
SNPedia will use a Creative Commons Attribution-Share Alike 3.0 Unported License
HGMD, Human Gene Mutation Database
http://www.hgmd.cf.ac.uk/ac/index.php
This database collates published gene mutations and variations thought to be responsible for human inherited disease; this database includes 73411 total mutation entries in over 2000 genes.
To provide support for its maintenance, the complete database is available commercially, with limited usage of content that is more than 2 years old available without charge to academics.
Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. 2003. The Human Gene Mutation Database (HGMD®): 2003 Update. Hum Mutat 21:577-581. doi:10.1002/humu.10212
Mitomap
http://www.mitomap.org/
Mitomap provides annotations of the human mitochondrial sequence, including information about polymorphisms and mutations. Most information is taken from the literature, but it is also possible to contribute new information online. For many variations, it indicates associated disease and the level of confidence in the association.
Several relevant tables are copyrighted by Elsevier and presumably not redistributable.
Ruiz-Pesini E, Lott MT, Procaccio V, Poole J, Brandon MC, Mishmar D, Yi C, Kreuziger J, Baldi P, Wallace DC. 2007. An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucleic Acids Research 35 (Database issue):D823-D828. doi: 10.1093/nar/gkl927
GeneTests
http://www.genetests.org/
GeneTests provides summary information on 600 laboratories offering more than a thousand genetic tests for clinical and research use. It also includes 400 expert-authored, peer-reviewed “GeneReviews” that discuss application of genetic tests for diagnosis, management, and counseling of patients.
GeneTests provides aggregate data reports and grant permission to redistribute materials with attribution.
GeneTests: Medical Genetics Information Resource (database online). Copyright, University of Washington, Seattle. 1993-2007. Available at http://www.genetests.org.
Pagon RA. 2006. GeneTests: an online genetic information resource for health care providers. J Med Libr Assoc 94:343-8.
PharmGKB
http://www.pharmgkb.org/
The pharmacogenetics and pharmacogenomics knowledgebase relates genomic variation to phenotypes associated with pharmaceutical activity. It offers integrated knowledge in terms of gene summaries, pathways and annotated literature. Currently, more than 22,400 samples have been genotypes with 2,581,832 variant reported and 11,353 polymorphisms reported. There are approximately 250,000 phenotype measurements.
The data in PharmGKB are available to all research scientists, with individual data access requiring registration and subject to limitations to protect subject privacy.
Klein TE, Chang JT, Cho MK, Easton KL, Fergerson R, Hewett M, Lin Z, Liu Y, Liu S, Oliver DE, Rubin DL, Shafa F, Stuart JM, Altman RB. 2001. Integrating genotype and phenotype information: an overview of the PharmGKB project. The Pharmacogenomics Journal 1:167-170.
Chromosomal Variation in Man Online
http://www.wiley.com/borgaonkar
This is a compendium of 24,000 citations regarding chromosomal alterations, phenotypes, and abnormalities, collected by Digamber S. Borgaonkar. Citations are typically to individual cases, with a brief description of pathology observed.
Web access is free and results of searches can be redistributed for research purposes.
Locus Specific Databases Database
http://www.hgvs.org/dblist/glsdb.html
This is an enumeration of over 600 locus specific databases.
The site is freely available and copyright restrictions on redistribution are enforced
Horaitis O, Talbot Jr CC, Phommarinh M, Phillips KM, Cotton RGH. 2007. A database of locus-specific databases. Nature Genetics 39:425. doi:10.1038/ng0407-425
SIFT
http://blocks.fhcrc.org/sift/SIFT.html
SIFT is a program for predicting whether an amino acid substitution affects protein function, based on sequence homology and the physical properties of amino acids. It is one of several programs using sequence information for this purpose.
The software is freely available and can be modified and redistributed with attribution.
Ng PC, Henikoff S. 2002. Accounting for human polymorphisms predicted to affect protein function. Genome Research 12:436-446. doi:10.1101/gr.212802
SNPs3D
http://www.snps3d.org
SNPs3D has predictions of phenotypic impact of SNPs based on sequence, structure, and cellular networks. It is one of several resources using protein structure for this purpose.
Predictions on website can be browsed freely, but the software is not available online.
Yue P, Melamud E, Moult J. 2006. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 22:166. doi:10.1186/1471-2105-7-166
MutaGeneSys
http://www.cs.columbia.edu/~jds1/MutaGeneSys/
This software accepts SNP data of individuals and maps OMIM records to them. Using natural language processing, there are only 133 parsed unique participating SNPs found associated with OMIM disease records, but the data are enriched with marker correlation data to yield a total of 1300 population-specific correlations.
The software can be freely downloaded.
Stoyanovich J, Pe’er I. 2007. MutaGeneSys: making diagnostic predictions based on genome-wide genotype data in association studies. Columbia University Technical Report, February 16, 2007. http://mice.cs.columbia.edu/getTechreport.php?techreportID=448&format=pdf
Craig Venter’s genome
http://www.jcvi.org/research/huref/
This is a launch point for the Venter Institute’s sequence of J. Craig Venter, including the sequence, variants, traces, and the open-access assembler
The data are freely available, the manuscripts is open access with the Creative Commons Attribution License, and the Celera Assembler is open source.
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. 2007. The diploid genome sequence of an individual human. PLoS Biol 5: e254. doi:10.1371/journal.pbio.0050254
James Watson’s genome
http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/
This is a browser for Jim Watson’s genome variants overlayed on a reference genome, with some OMIM entries automatically associated using MutaGeneSys.
Anyone can browse and download the sequence data and variants. The GMOD browser is open source.
Navigenics and 23andMe
http://www.navigenics.com
http://www.23andMe.com
23andMe and Navigenics are two of the most prominent new companies intending to provide personal genome interpretation.
Genome Commons
http://www.GenomeCommons.org
This new website supports the genome commons and is a portal to associated resources.
Contents copyrighted, with most content available under the Creative Commons Attribution License.
Brenner SE. 2007. Common sense for our genomes. Nature 449:783-784. doi:10.1038/449783a
There are several groups that have developed software to interpret the phenotypic impact of SNPs. Following is a brief bibliography of some such programs with which I am familiar.
Ng PC, Henikoff S. 2003. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Research 31:3812-3814. doi:10.1093/nar/gkg509
Wang Z, Moult J. 2001. SNPs, protein structure, and disease. Human Mutation 17:263-270. doi:10.1002/humu.22
Karchin R, Monteiro ANA, Tavtigian SV, Carvalho MA, Sali A. 2007. Functional impact of missense variants in BRCA1 predicted by supervised learning. PLoS Comput Biol 3: e26. doi:10.1371/journal.pcbi.0030026
Yue P, Melamud E, Moult J. 2006. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7:166. doi:10.1186/1471-2105-7-166
Bao L, Zhou M, Cui Y. 2005. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Research 33(Web Server Issue):W480-W482. doi:10.1093/nar/gki372
Ramensky V, Bork P, Sunyaev S. 2002. Human non-synonymous SNPs: server and survey. Nucleic Acids Research 30:3894-3900. http://nar.oxfordjournals.org/cgi/content/full/30/17/3894
Thomas PD, Kerjariwal A. 2004. Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. Proc Natl Acad Sci 101:15398-15403. doi:10.1073/pnas.0404380101
Stone EA, Sidow A. 2005. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Research 15:978-986. doi:10.1101/gr.3804205
Chasman D, Adams RA. 2001. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. Journal of Molecular Biology 307:683-706. doi:10.1006/jmbi.2001.4510
Saunders CT, Baker D. 2002. Evaluation of structural and evolutionary contributions to deleterious mutation prediction. Journal of Molecular Biology 322:891-901. doi:10.1016/S0022-2836(02)00813-6
Ye ZQ, Zhao SQ, Gao Ge, Liu XQ, Langlois RE, Lu H, Wei L. 2007. Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics 23:1444-1450. doi:10.1093/bioinformatics/btm119
Giardine B, Riemer C, Hefferon T, Thomas D, Hsu F, Zielenski J, Sang Y, Elnitski L, Cutting G, Trumbower H, Kern A, Kuhn R, Patrinos GP, Hughes J, Higgs D, Chui D, Scriver C, Phommarinh M, Patnaik SK, Blumenfeld O, Gottlieb B, Vihinen M, Väliaho J, Kent J, Miller W, Hardison RC. 2007. PhenCode: connecting ENCODE data with mutations and phenotype. Human Mutation 28:554-562. doi:10.1002/humu.20484
Stoyanovich J, Pe’er I. 2007. MutaGeneSys: making diagnostic predictions based on genome-wide genotype data in association studies. Columbia University Technical Report, February 16, 2007. http://www.cs.columbia.edu/~jds1/MutaGeneSys/
Han A, Kim W-Y, Park S-M. 2007. SNP2NMD: a database of human single nucleotide polymorphisms causing nonsense-mediated mRNA decay. Bioinformatics 23:397-399. doi:10.1093/bioinformatics/btl593
Cavallo A, Martin AC. 2005. Mapping SNPs to protein sequence and structure data. Bioinformatics 21:1443-1450. http://www.bioinf.org.uk/saap/db/