htslib icon indicating copy to clipboard operation
htslib copied to clipboard

Duplicate Sample IDs in VCF aborts

Open david4096 opened this issue 7 years ago • 4 comments

(via pysam) A VCF that has repeated sample names causes htslib to crash. An offending VCF can be found here: https://github.com/ga4gh/server/tree/master/tests/faultydata/variants/duplicated_sampleid .

As described here: https://github.com/ga4gh/server/issues/521

david4096 avatar Oct 28 '16 18:10 david4096

bcf_hdr_add_sample() aborts on certain errors, which some people misinterpret as crashing. Being a library function, it should really be returning an error code so the caller can decide what to do about the problem. (Note that if hts_verbose < 2 it does just return an error code in this case — this is clearly wrong as such behaviour should never depend on the verbosity setting.)

Fellow maintainers: Is there a reason for aborting here (would the resulting data structures be too broken to continue?), or is this just a legacy of the early htslib code?

jmarshall avatar Oct 31 '16 11:10 jmarshall

Please note that the abort() command was intentionally added to resolve bcftools issue 184 ("Wrong genotypes if duplicate sample names").

So, it is not appropriate to simply remove it.

kirkmcclure avatar May 29 '17 17:05 kirkmcclure

Abort is never an appropriate action for a library call unless there is genuingly nothing else it could do or if the situation should really never occur (in which case if we reached it then something very odd has happened, possibly memory corruption or some foul play is at hand and continuing could be even more problematic).

This situation is simply a matter of reporting an error and returning with the correct error code. I haven't looked into seeing how easy that is to do in this code path.

jkbonfield avatar May 30 '17 07:05 jkbonfield

Just to confirm (as it was me who added the abort) - a proper error propagation is preferred indeed. However, if the abort is going to be removed, it would be great if bcftools could be made aware of the change.

pd3 avatar May 30 '17 11:05 pd3