htslib icon indicating copy to clipboard operation
htslib copied to clipboard

HTSlib should fail on trailing INFO garbage

Open yangyxt opened this issue 3 years ago • 3 comments

The version I use is 1.11 The command I ran is bcftools view -R <target_region>.tsv -Oz -o <output_path>.vcf.gz <input_path>.vcf.gz

The vcf file is from simulation data, the golden vcf file. And the input vcf file looks like this: image

The output vcf file looks like this: image

Be aware of the part marked by the red circle. The end of the row is automatically sliced out, the trailing slash and last digit. Pls take a look at this issue and let me know how can I resolve this. Thx!

yangyxt avatar Mar 09 '21 03:03 yangyxt

This is partly a problem with your VCF, partly with HTSlib:

  1. the header says the WP field is an integer with Number=A values. If such, the values in the body should be comma-separated, not slash separated. Also there is wrong number of values.

  2. however, the library should fail or at least print a warning about the broken INFO record.

pd3 avatar Mar 09 '21 17:03 pd3

Thx for the response! In this case, how should I modify the format of my VCF file to make this right?

btw, no warning messages are given by the bcftools view

yangyxt avatar Mar 10 '21 13:03 yangyxt

I don't know what is the intention, but probably it would be best to redefine the tag in the header as Type=String. That way it will stay preserved.

pd3 avatar Mar 16 '21 14:03 pd3