htsjdk icon indicating copy to clipboard operation
htsjdk copied to clipboard

Remove validation of VCF Header line field order?

Open tfenne opened this issue 3 years ago • 1 comments

@lbergelson, @droazen and anyone else who may be interested. Following the discussion in https://github.com/samtools/hts-specs/issues/642 would there be support for (or any objections to) a PR that eliminated the validation of ordering of fields within a given VCF header line?

This issue came up because a[n old] version of one of the GATK's SV tools produces this header line:

##INFO=<ID=END2,Type=Integer,Number=1,Description="Position of breakpoint on CHR2">

instead of the more common:

##INFO=<ID=END2,Number=1,Type=Integer,Description="Position of breakpoint on CHR2">

The discussion on the spec issue hasn't led to a PR yet but there seems to be consensus on clarifying the language to make it clear that there is no required ordering of fields within a single header line. I'm not really sure why HTSJDK validates this in the first place and it makes the header parsing code quite a bit more complicated too. I'd like to submit a PR to remove the checking but would appreciate knowing in advance if folks are receptive to it.

cc @nh13

tfenne avatar May 27 '22 16:05 tfenne

I'd love to see that code removed - its a pretty awkward way to do the order validation anyway. I'd prefer to see it done as part of https://github.com/samtools/htsjdk/pull/1581 though (I'm happy to make the changes there), since that already changes the same public APIs that will have to change for this.

cmnbroad avatar May 31 '22 18:05 cmnbroad