hap.py
hap.py copied to clipboard
Docker
Hi Peter i have couple of questions, please
- i try to define the reference location in both providing the -r parameter and exporting the HGREF variable the result is empty (for the TRUTH reference) table, which i guess correspond to the non-reference counts at the beginning of the run. What am I doing wrong?
[exomeuser@labsrv4 TwistExomeRefSeq_NA12878HG001_S1_4]$ export HGREF=/nadata/data/exomeseq/workspace/TwistExomeRefSeq_NA12878HG001_S1_4/compare_to_ref/hg19.fa
[exomeuser@labsrv4 TwistExomeRefSeq_NA12878HG001_S1_4]$ cd compare_to_ref/
[exomeuser@labsrv4 compare_to_ref]$ docker run -it -v pwd:/data pkrusche/hap.py /opt/hap.py/bin/hap.py /data/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf.vcf /data/combined_TwistExomeRefSeq_NA12878HG001_S1_4.recal.all_edit.vcf -f /data/intersect_beds.bed -r /data/hg19.fa -o /data/test
2019-05-30 10:29:42,919 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2019-05-30 10:29:42,948 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[I] Total VCF records: 3775119
[I] Non-reference VCF records: 3775119
[I] Total VCF records: 37554
[I] Non-reference VCF records: 34867
Benchmarking Summary: Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio INDEL ALL 0 0 0 2713 0 2713 0 0 NaN 1 NaN NaN NaN NaN 2.424936 INDEL PASS 0 0 0 2713 0 2713 0 0 NaN 1 NaN NaN NaN NaN 2.424936 SNP ALL 0 0 0 32184 0 32184 0 0 NaN 1 NaN NaN 2.599955 NaN 1.832658 SNP PASS 0 0 0 32184 0 32184 0 0 NaN 1 NaN NaN 2.599955 NaN 1.832658
-
the docker image doesn't consider complex variants, as i understand, meaning if i want use the rtg-tools i need real installation?
-
The documentation says the software might be installed on Centos 7, but also says i need gcc++ 4.9x, while the most updated rpm for Centos is 4.85. Am i missing something?
thank you, Katie
Hi Katie,
- the environment variable HGREF is set outside the Docker image to a location not accessible from within. The location needs to be relative to a path mounted inside the image. In your example above you can ignore the HGREF-related message since you are specifiying the reference location explicitly as /data/hg19.fa (which should work according to your other commands). HGREF is only used as a default if no reference location is specified.
A reason for the 0 counts could be that the chromosome names are not consistent. GiaB typically has numerical names (1, 2, 3...), hg19 has chr-prefixed names (chr1, chr2, ...). The simplest way to fix this if all data is Grch37/hg19 would be to strip the chr prefix from the query file.
-
The Docker image should contain vcfeval; also, hap.py can consider complex variants to some degree without vcfeval. To run with vcfeval you can specify
--engine=vcfeval -
To get more recent c++ compilers on older Centos, you can use the devtoolset versions: https://www.softwarecollections.org/en/scls/rhscl/devtoolset-7/ ; there is also a fix available which should (theoretically) make it work with 4.8.x: https://github.com/Illumina/hap.py/pull/82
Hope this helps! Peter