bedtools2
bedtools2 copied to clipboard
fix parsing headers 48105b07
see the comments in https://github.com/arq5x/bedtools2/commit/48105b07dc665d333a06677aed8340f82decacd7#
the two versions of isHeaderLine
are not equivalent and the commit broke reading correct BED files
Why revert it? This is a critical optimization that makes it significantly faster.
I guess we could fix the parser - of course it has bug.
And please note, the original version isn't a reliable parsing code as well. The point is we need to handle most of the case.
This is a critical optimization that makes it significantly faster.
ok, well, but how much faster? do you have a benchmark?
how many times is that function (isHeaderLine
) called?
e.g., if i have a file with 1M lines, but the header is just the first line, i would expect it to be called ~ once or twice (then it would be unnecessary to optimize it at all); if it's called 1M times, is it necessary?
i have updated the diff and tested it on this BED file:
chr start end name score strand
NC_045512.2 0 29903 sars2 . +
bedtools v2.29.2 can read this file bedtools v2.30.0 is broken