tiny-regex-c icon indicating copy to clipboard operation
tiny-regex-c copied to clipboard

regex compilation is making things slower?

Open marler8997 opened this issue 4 years ago • 1 comments

After my experiment to change to a "flat memory layout" here (https://github.com/kokke/tiny-regex-c/pull/62), I had a suspicion that the regex compilation may not be improving performance by much. It took a while but I re-implemented everything without compilation in this commit https://github.com/marler8997/tiny-regex-c/commit/8311dc883418407e83fcdf2468867e3f3448ca93. Turns out that without regex compilation, the implementation is around 300 SLOC (instead of 500) and about 15% faster on my machine.

But hold on, it probably wasn't a fair race. My commit also has some other changes that help out performance (like https://github.com/kokke/tiny-regex-c/pull/63). If all things were equal, the compiled version still might be faster in some cases, I'm not actually sure. It does seem clear that both the compiled and non-compiled versions perform very similarly, but I think each one will be better in certain environments , on certain machines or with different workloads.

That being said, I feel like the change aligns well with the goals of this project, the primary one in the name "tiny-regex-c" which is now even "tiny-er". The other nice benefit is not needing to allocate anything for the regex objects nor needing static memory for them. (makes https://github.com/kokke/tiny-regex-c/issues/3 a non issue).

I also noticed that while removing compilation, I also happened to fix a few bugs (see some of them below). I also added makefile targets for each test along with a comprehensive test that runs all the targets from a clean repo (because if you don't write a test for it, then it will break).

I'm unsure whether @kokke will want to take this commit as it's such a huge fundamental change, but I figure I'd let you know the implementation is here, feel free to take all or none of the idea into this repository. I'm also happy to submit PR's for individual changes from this idea as I'm sure you'll not want to take everything all at once (I know I wouldn't). However, right now I've got 7 PRs in the queue so I'll hold off on making any more until some of those ones are settled.

Summary of related issues

fixes https://github.com/kokke/tiny-regex-c/issues/3 fixes https://github.com/kokke/tiny-regex-c/issues/40 fixes part (1) of https://github.com/kokke/tiny-regex-c/issues/53 invalidates https://github.com/kokke/tiny-regex-c/issues/59 fixes https://github.com/kokke/tiny-regex-c/issues/66 fixes https://github.com/kokke/tiny-regex-c/issues/69

marler8997 avatar Mar 08 '21 11:03 marler8997

When I wrote such a simple matcher back in the 90ties, I skipped the compilation part, as it is slower and needs memory. Just dynamic matching was enough. I had even group expressions then, and the matchers where much simpler, similar to Rob Pike's. I wrote it for asterisk, but cannot find it anymore.

rurban avatar Jun 20 '22 05:06 rurban