librxvm icon indicating copy to clipboard operation
librxvm copied to clipboard

rxvm_fsearch is only line-based. Make this configurable

Open eriknyquist opened this issue 6 years ago • 0 comments

Currently, the BMH portion of rxvm_fsearch uses the following heuristic to run a BMH search on a fixed substring from an expression;

  • Run BMH string search using fixed substring
  • On match, from the first char. of match, back up to the last newline character
  • Run vm_execute (full regexp matching) from here

Now, this doesn't cause as many issues as it may sound at first-- rxvm_compile, while compiling the expression initially, is keeping track of any fixed substrings potentially suitable for BMH, and there are several things that will get an expression marked as "unsuitable for BMH" (for more details, you can look at the tests for this behaviour, in tests/src/test_rxvm_lfix_heuristic.c). One of those things is a literal newline character in the expression, since if the expression spans multiple lines, I won't know how far to back up...... it's technically possible, if I track the number of newlines in the expression, but I think that can be a future optimisation.

One issue, however, is that you can't really (very successfully) use rxvm_fsearch in a file that contains no newline characters. Or, if it's a huge file with very few newlines then it can be very slow. So I need to either provide a way to completely disable this line-based-BMH-substring method, or figure out some way to do it without newline characters at all. Or, I guess, provide some other alternative, for example, if the user is willing to specify:

1) the maximum size of any possible matching text, or
2)  an alternate, frequently-occurring character, besides newline, that can instead be "backed up to"

Then we could still use BMH effectively, for a full regexp in a file that is not line-based

eriknyquist avatar Oct 18 '17 06:10 eriknyquist