hap.py
hap.py copied to clipboard
Reference header in output vcf hardcoded to HG19
Running hap.py with HG38 vcfs and an HG38 reference and outputting an annotated vcf, the evaluation appears to use the correct reference, however the dictionary in the output vcf header has HG19 contigs.
/opt/hap.py/bin/hap.py na12878.hg38.truth_set.vcf.gz na12878.hg38.eval_set.vcf.gz -f confidence.bed -r Homo_sapiens_assembly38.fasta -V -L --preprocess-truth -o na12878_hg38_happy
leads to an output vcf which begins:
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##reference=hg19
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##INFO=<ID=END,Number=.,Type=Integer,Description="SV end position">
##INFO=<ID=IMPORT_FAIL,Number=.,Type=Flag,Description="Flag to identify variants that could not be imported.">
##FORMAT=<ID=AGT,Number=1,Type=String,Description="Genotypes at ambiguous locations">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=A,Type=Integer,Description="Allele Depths">
##FORMAT=<ID=ADO,Number=.,Type=Integer,Description="Summed depth of non-called alleles.">
It appears that this may be due to the constructor of VariantWriterImpl calling bcfhelpers::bcfHeaderHG19(hdr), which hardcodes an HG19 header into the output file.