libfsm
libfsm copied to clipboard
Use Aho-Corasick for alts of literals during regexp AST compilation
I'll have more to say about this, but I'm too tired to write about it at the moment.
It's difficult to see the structure for unanchored strings (because we have no way to inform graphviz to weight the backwards edges differently).
But here's the simpler case for anchored strings, showing 100 alts of {3,20} characters in length. The same (randomly generated) data for each. Hopefully you can see the difference in the structure, with the trie shape visible.
Recursive Thompson construction | Aho-Corasick |
---|---|
![]() |
![]() |