csvtk icon indicating copy to clipboard operation
csvtk copied to clipboard

filter2 command is too slow

Open y9c opened this issue 1 year ago • 7 comments

Compared with filter command or awk, fiter2 command is much slower, especially for rule with multiple conditions.

It might be relative to this function in the for-loop, which repeatedly parsing the expression. https://github.com/shenwei356/csvtk/blob/9407f73e2d72dddf5042c7dbb6299a180ea9cf4a/csvtk/cmd/filter2.go#L370-L376

y9c avatar Apr 11 '24 03:04 y9c

Yes, I noticed that. It is slow :(

shenwei356 avatar Apr 11 '24 07:04 shenwei356

Can we move the Expression parsing function outside the for-loop and run it only once?

y9c avatar Apr 23 '24 22:04 y9c

It is slow, but it must be done like that. Cause filterStr1 is different in each iteration.

shenwei356 avatar May 17 '24 15:05 shenwei356

Why filterStr1 is different? Can we cache the parsed results?

y9c avatar May 17 '24 16:05 y9c

It's the expression, like '$age > 18', the $age needs to be replaced with the value of each row.

shenwei356 avatar May 17 '24 17:05 shenwei356

Yes. I mean can we parsed the expression as something like '$1>18' and reuse the code of the filter command to deal with the computation afterward

y9c avatar May 17 '24 17:05 y9c

parsed the expression as something like '$1>18' and reuse the code of the filter command I don't think so.

God, it's really slow~ I used it a lot recently. Have to improve it, when I have time ~

shenwei356 avatar Jun 27 '24 08:06 shenwei356