enry icon indicating copy to clipboard operation
enry copied to clipboard

Consider adding another regexp engine

Open vmarkovtsev opened this issue 7 years ago • 7 comments

We have already seen that Oniguruma improves enry's performance by a high margin. There are bindings to the regular expression engine in Rust - https://github.com/BurntSushi/rure-go People say that Rust has one of the fastest implementations, so it makes sense to check how enry works on rure-go.

vmarkovtsev avatar Oct 21 '18 21:10 vmarkovtsev

This looks interesting! Thank you for bringing it up.

Before venturing into integration, I think it's better to find or produce a head-to-head comparison between Rust-based rule-go and C-based Oniguruma regexp impls:

  • feature (do they support same set of features? esp. those outside of RE2 that enry might rely on)
  • performance (how do they both perform on same synthetic benchmark)

Knowing that would help us to make an informed decision on integration.

Same process could be applied to other ways of speeding regexps up e.g using pre-compiled state machines like ragel.

bzz avatar Oct 23 '18 14:10 bzz

Note that we had some concurrency problems using oniguruma on gitbase: https://github.com/src-d/gitbase/issues/544

We stopped generating binaries using oniguruma, it is not stable.

Other solutions comparison: https://rust-leipzig.github.io/regex/2017/03/28/comparison-of-regex-engines/

Also, there are available other solutions based on Onigmo, an Oniguruma fork: https://github.com/ungerik/gonigmo

ajnavarro avatar Oct 23 '18 15:10 ajnavarro

Another possibility: https://github.com/logrusorgru/grokky based on re2

vmarkovtsev avatar Oct 23 '18 15:10 vmarkovtsev

rust implementation is just inspired by https://github.com/google/re2.

btw. these libraries are not huge and can improve performance for many projects/teams, so why not to start porting one of solutions to native go (maybe tailored for our needs). At least we can get rid of cgo.

or just read :) https://medium.com/@dgryski/speeding-up-regexp-matching-with-ragel-4727f1c16027

kuba-- avatar Nov 03 '18 02:11 kuba--

To keep the party going - if interested in complicating build and CI envs, there is also a https://github.com/intel/hyperscan

bzz avatar Feb 14 '19 12:02 bzz

Note that gitbase will continue to use oniguruma. We forked the go bindings for mainteinance. https://github.com/src-d/go-oniguruma But AFAIK this introduces race conditions that currently require to be avoided by the user.

@kuba-- or @ajnavarro know more about this.

smola avatar Mar 15 '19 16:03 smola

go-mysq-server already switched to go-oniguruma as a default regex engine, so gitbase will auto-switch after upgrading mysql. Regarding race condition - it was already solved in gitbase by introducing a pool of regex parsers.

kuba-- avatar Mar 15 '19 17:03 kuba--