bedtools2 icon indicating copy to clipboard operation
bedtools2 copied to clipboard

fix parsing headers 48105b07

Open kuk0 opened this issue 3 years ago • 4 comments

see the comments in https://github.com/arq5x/bedtools2/commit/48105b07dc665d333a06677aed8340f82decacd7# the two versions of isHeaderLine are not equivalent and the commit broke reading correct BED files

kuk0 avatar Jun 23 '21 22:06 kuk0

Why revert it? This is a critical optimization that makes it significantly faster.

38 avatar Jun 23 '21 23:06 38

I guess we could fix the parser - of course it has bug.

And please note, the original version isn't a reliable parsing code as well. The point is we need to handle most of the case.

38 avatar Jun 23 '21 23:06 38

This is a critical optimization that makes it significantly faster.

ok, well, but how much faster? do you have a benchmark? how many times is that function (isHeaderLine) called? e.g., if i have a file with 1M lines, but the header is just the first line, i would expect it to be called ~ once or twice (then it would be unnecessary to optimize it at all); if it's called 1M times, is it necessary?

kuk0 avatar Jun 24 '21 11:06 kuk0

i have updated the diff and tested it on this BED file:

chr	start	end	name	score	strand
NC_045512.2	0	29903	sars2	.	+

bedtools v2.29.2 can read this file bedtools v2.30.0 is broken

kuk0 avatar Jun 24 '21 11:06 kuk0