[Feature Request] Cannot sort using custom contig order
Hi, I have some VCFs where the sequence dictionary in the header is out of the canonical order because of a tool's decisions. I'd like to be able to sort by VCF to follow the "usual" order, in other words sort according to a custom order of the contigs, e.g. the order from a reference fai for example.
Here are some possible ways some new features in bcftools could allow for this.
- Update
bcftools reheader -f ref.faito also force the new header ordering of the contigs to match the order in theref.fai. I'd imagine this is the simplest to implement, and then can be followed with abcftools sortto get the entries to match this ordering, but is not strictly backwards compatible since behavior of an existing flag would change. - Update
bcftools sortto include a-f ref.faiinput to do both of the things described above: update the header to have sequence dict matching the order of the input, and sort all the records according to this order.
Unless I missing something, there is currently no way to (easily) achieve this with bcftools.
I am not opposed to adding this feature, but it is unlikely to happen by my doing. What is the motivation for this request? VCF specification does not mandate any specific order of the contigs, programs should not be relying on it.
The motivation is that some tools write records unsorted, and then you can only sort according to the sequence dictionary in the header using bcftools sort. This means if you want to do anything where you iterate over a family of files (e.g. your VCF, a bed file, a BAM, etc), you'd be unable to traverse them "together" since they would be sorted according to different conventions. It would be great to be able to coerce the ordering in your VCF to match your "normal" convention all your other files are following.