hap.py icon indicating copy to clipboard operation
hap.py copied to clipboard

Buffer overflow in hap.py’s `preprocess` program

Open ThomasHickman opened this issue 6 years ago • 1 comments

At https://github.com/Illumina/hap.py/blob/6c907ce3b02956bc239022db6edea7c48d6ddb8b/src/c++/lib/variant/VariantReader.cpp#L735, the size allocated for the array ad is adcount, whereas in the function https://github.com/Illumina/hap.py/blob/6c907ce3b02956bc239022db6edea7c48d6ddb8b/src/c++/lib/tools/BCFHelpers.cpp#L532  (which gets called at VarientReader:738 with ad) writes values.size() elements ( where values is a vector resembling the elements in the AD entry of a sample), causing a buffer overflow.

This could either be solved by: exiting on an incorrect AD field, truncating the interpreted AD fields so that they are of correct length or making the size of AD be the size of the actualAD fields (I haven’t actually looked though the code to see how this code path is used and what would be appropriate).

This bug can be shown to happen in the following vcf (I’ve exaggerated the number of AD fields to make problems happen). This should segfault on the next delete [] statement due to heap corruption:

##fileformat=VCFv4.1
##contig=<ID=chr1,length=10000>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1
chr1    2       .       AAT     T       1       PASS    .       GT:AD   1/1:1,1,1,1,2,3,2,4,1,3,5,5,3,2,12,1,3,3,1

ThomasHickman avatar Apr 17 '18 16:04 ThomasHickman

Thanks for the report!

I think the temporary workaround for this would be to use bcftools annotate to remove AD (which isn't used by hap.py).

In the future, the code that uses VariantReader will be retired, but I'll include a fix for this in the next version.

pkrusche avatar Apr 17 '18 17:04 pkrusche