hap.py
hap.py copied to clipboard
Inconsistent output for runs with and without stratification files
I run hap.py twice on the same vcf file - first time without an additional stratification file, and the second time with stratification for GC and Low complexity regions. I am puzzled by variation in the output of the tool, I get different values in precision and recall, for the same stratification category calculated in hap.py.
Why do I observe this variation?
Type Subtype Subset Filter Genotype QQ.Field QQ METRIC.Recall METRIC.Precision
1st run INDEL I16_PLUS TS_boundary PASS * QUAL * 0.5 0.857143
2nd run INDEL I16_PLUS TS_boundary PASS * QUAL * 0.546392 0.928571
Following command was used to run hap.py first time, without stratification option:
sudo docker run -it \
-v `pwd`:/data \
pkrusche/hap.py \
/opt/hap.py/bin/hap.py \
/data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
/data/SAMPLE.HG001-NA12878.vcf.gz \
-f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
-r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
--verbose \
--logfile /data/out_dir/log.txt \
-o /data/out_dir/SAMPLE
Following command was used to run hap.py second time, with stratification option:
sudo docker run -it \
-v `pwd`:/data \
pkrusche/hap.py \
/opt/hap.py/bin/hap.py \
/data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
/data/SAMPLE.HG001-NA12878.vcf.gz \
-f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
-r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
--stratification /data/LowComplexity_GC.tsv \
--verbose \
--logfile /data/out_dir/log.txt \
-o /data/out_dir/SAMPLE
Here is a sample of stratification file:
gc15 /data/GA4GH/benchmarking-tools/resources/stratification-bed-files/GCcontent/human_g1k_v37_l100_gc15_slop50.bed.gz
AllRepeats_51to200bp_gt95identity_merged /data/GA4GH/benchmarking-tools/resources/stratification-bed-files/LowComplexity/AllRepeats_51to200bp_gt95identity_merged.bed.gz
Default xcmp hap.py comparison engine was used hap.py version installed in the container: 0.3.8-17-gf15de4a