hap.py icon indicating copy to clipboard operation
hap.py copied to clipboard

Reference header in output vcf hardcoded to HG19

Open kachulis opened this issue 7 years ago • 0 comments

Running hap.py with HG38 vcfs and an HG38 reference and outputting an annotated vcf, the evaluation appears to use the correct reference, however the dictionary in the output vcf header has HG19 contigs.

/opt/hap.py/bin/hap.py na12878.hg38.truth_set.vcf.gz na12878.hg38.eval_set.vcf.gz -f confidence.bed -r Homo_sapiens_assembly38.fasta -V -L --preprocess-truth -o na12878_hg38_happy leads to an output vcf which begins:

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##reference=hg19
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##INFO=<ID=END,Number=.,Type=Integer,Description="SV end position">
##INFO=<ID=IMPORT_FAIL,Number=.,Type=Flag,Description="Flag to identify variants that could not be imported.">
##FORMAT=<ID=AGT,Number=1,Type=String,Description="Genotypes at ambiguous locations">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=A,Type=Integer,Description="Allele Depths">
##FORMAT=<ID=ADO,Number=.,Type=Integer,Description="Summed depth of non-called alleles.">

It appears that this may be due to the constructor of VariantWriterImpl calling bcfhelpers::bcfHeaderHG19(hdr), which hardcodes an HG19 header into the output file.

kachulis avatar Aug 03 '18 17:08 kachulis