score icon indicating copy to clipboard operation
score copied to clipboard

Munge plugin fails to read last column of TSV when comment character present

Open njd34 opened this issue 3 months ago • 1 comments

The +munge plugin appears to fail to read the last column of a tab-delimited header line if it starts with a # character.

So the following (tab-delimited, b37 positions) issued a warning about not finding the p-value column:

#CHROM POS REF ALT BETA SE P 13 19141990 C G 0.5 0.001 0.05 13 19144814 T C 0.5 0.03 0.05 13 19147641 T C 0.8 0.02 0.04 13 19149280 C T -1.1 0.001 1.0e-8

If it starts with CHROM instead it will work fine. If I add a nonsense column at the end, it will work fine. If I move e.g. the ALT column to the end, it will error out due to not finding the alternative allele column.

Gemini suggests it could be something to do with the HTSlib parser but I haven't been able to verify.

njd34 avatar Sep 19 '25 12:09 njd34

Thanks for reporting this bug. If you enter the bcftools source code directory and you run the following command:

sed -i 's/str\.l--;/str.l--;\n        str.s[str.l] = '"'"'\\0'"'"';/' plugins/munge.c

It should fix the bug after you recompile

freeseek avatar Sep 19 '25 13:09 freeseek