1000 Genomes Project (Phase 3 release)
UID: 11499
Publisher(s): IGSR: The International Genome Sample Resource- Description
- Summary from IGSR:
"The goal of the 1000 Genomes Project was to find common genetic variants with frequencies of at least 1% in the populations studied.
The 1000 Genomes Project took advantage of developments in sequencing technology, which sharply reduced the cost of sequencing. It was the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Data from the 1000 Genomes Project was quickly made available to the worldwide scientific community through freely accessible public databases.
Sequencing remained too expensive to deeply sequence the many samples being studied in the project. However, any particular region of the genome generally contains a limited number of haplotypes. Data was combined across samples to allow efficient detection of most of the variants in a region. The project planned to sequence each sample to 4x genomic coverage; at this depth, sequencing can not discover all variants in each sample, but can allow the detection of most variants with frequencies as low as 1%. In the final phase of the project, data from 2,504 samples was combined to allow highly accurate assignment of the genotypes in each sample at all the variant sites the project discovered. The multi-sample approach combined with genotype imputation allowed the project to determine a sample’s genotype, even in variants not covered by sequencing reads in that sample.
The contribution of the 1000 Genomes Project to genomics was summarised in Nature in the issue containing the final publications from the main project."
The data in this collection represents the final work of the 1000 Genomes Project, as completed in phase three of the project on GRCh37.
Some key files include:
- The GRCh37 reference genome used in this analysis
- Files listing the samples used in the work (.ped and panel)
- VCF files containing the variants detected and additional genotype VCF files listing genotypes for each individual at each variant location (provided per chromosome due to file size)
- Access Restrictions
-
Free to All
- Access Instructions
- Access via FTP server or IGSR website. Information on citation policies available via IGSR.
- Associated Publications
- Dataset Format(s)
- SRA, gzip, TBI, VCF
Do you have or know of a dataset that should be added to the catalog? Let us know!