hap.py icon indicating copy to clipboard operation
hap.py copied to clipboard

Inconsistent output for runs with and without stratification files

Open ofonov opened this issue 6 years ago • 0 comments

I run hap.py twice on the same vcf file - first time without an additional stratification file, and the second time with stratification for GC and Low complexity regions. I am puzzled by variation in the output of the tool, I get different values in precision and recall, for the same stratification category calculated in hap.py.

Why do I observe this variation?

Type	Subtype	Subset	Filter	Genotype	QQ.Field	QQ	METRIC.Recall	METRIC.Precision
1st run    INDEL	I16_PLUS	TS_boundary	PASS	*	QUAL	*	0.5	0.857143
2nd run    INDEL	I16_PLUS	TS_boundary	PASS	*	QUAL	*	0.546392	0.928571

Following command was used to run hap.py first time, without stratification option:

sudo docker run -it \
   -v `pwd`:/data \
   pkrusche/hap.py \
   /opt/hap.py/bin/hap.py \
   /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
   /data/SAMPLE.HG001-NA12878.vcf.gz \
   -f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
   -r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
   --verbose \
   --logfile /data/out_dir/log.txt \
   -o /data/out_dir/SAMPLE

Following command was used to run hap.py second time, with stratification option:

sudo docker run -it \
  -v `pwd`:/data \
  pkrusche/hap.py \
  /opt/hap.py/bin/hap.py \
  /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
  /data/SAMPLE.HG001-NA12878.vcf.gz \
  -f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
  -r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
  --stratification /data/LowComplexity_GC.tsv \
  --verbose \
  --logfile /data/out_dir/log.txt \
  -o /data/out_dir/SAMPLE

Here is a sample of stratification file:

    gc15	/data/GA4GH/benchmarking-tools/resources/stratification-bed-files/GCcontent/human_g1k_v37_l100_gc15_slop50.bed.gz
    AllRepeats_51to200bp_gt95identity_merged	/data/GA4GH/benchmarking-tools/resources/stratification-bed-files/LowComplexity/AllRepeats_51to200bp_gt95identity_merged.bed.gz

Default xcmp hap.py comparison engine was used hap.py version installed in the container: 0.3.8-17-gf15de4a

ofonov avatar Mar 28 '19 16:03 ofonov