gatk
gatk copied to clipboard
Document provenance of GRCh38 resource files in GATK bundle
Folks occasionally but consistently ask for this information on the forum and our current answer is that we provide these files as is. For the new GATK4 documentation, which we plan on releasing on January 9th alongside the new GATK4, I think we should aim to be more transparent.
The doc team aims to have select documentation ready by December 13, 2017, in preparation for the release.
Those involved in the creation of the GRCh38 resource files, could you kindly provide READMEs to place alongside these files?
For example, what were the processing steps used to generate each, what is the original source (version) of the resource used, etc. Thank you.
The files are as follows:
gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf
gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx
gs://genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz
gs://genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz.tbi
gs://genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz
gs://genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz
gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dict
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.fai
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz
gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
gs://genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
gs://genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi
gs://genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz
gs://genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz.tbi
gs://genomics-public-data/resources/broad/hg38/v0/wgs_calling_regions.hg38.interval_list
gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/