needle icon indicating copy to clipboard operation
needle copied to clipboard

Benchmarks comparing with other DFA-Regex-Engines?

Open almondtools opened this issue 3 years ago • 6 comments

I have written a regex benchmark comparing different regex engines for Java. Lately I found your approach and would be curious how it performs compared to the other alternatives:

  • You can run the benchmark on your own
  • If your project was available as artifact in a maven repository I would offer to extend regexbenchmark by your project and start a new benchmark.

almondtools avatar Feb 27 '22 10:02 almondtools

Thanks for the note. I'll have a look at your benchmarks, and keep them in mind.

Right now, I have a few things that I think need to be addressed before I push this to maven, and cut a 0.1 release.

hyperpape avatar Mar 24 '22 03:03 hyperpape

@almondtools I was looking at the benchmarks--are there any scripts for handling the output?

hyperpape avatar Aug 15 '23 15:08 hyperpape

I am not certain to understand ... I would suggest that you implement a triple

  • A benchmark extends MatcherBenchmark
  • An automaton implements Automaton which is referenced in the benchmark (an which is a wrapper of your algorithm)
  • A test extends MatcherBenchmarkTest

The tests search a pattern in a sample and compare the number of found results with a reference implementation. It is not checked whether all results are found at the correct location. I think the large test corpus (of the scaling benchmarks) prevents that a benchmark passes with pure luck.

Does it help you?

almondtools avatar Aug 15 '23 18:08 almondtools

Sorry, my earlier question was a bit vague.

Yes, I was able to implement those in a branch I have locally, and doing so helped me find two bugs in needle.

However, when I run the tests, it seems to give mostly unstructured output to the console. Is there a good technique for turning that data into a table or other format that's good for analysis so I can easily compare my library to others? I didn't know if I missed something in your repo that does that, or if there's a nicer way than reading the results and extracting data by hand.

hyperpape avatar Aug 15 '23 19:08 hyperpape

Probably you found the files *bench*.cmd. They write the benchmark data to csv and text output (examples are attached), Unfortunately I did not develop tools to analyze or visualize the benchmark results. I did this for stringbench, but it was much effort and is probably not easy to reuse.

I also noticed that the benchmarks will have to be adjusted for other versions of java/jmh, hopefully you have solved this already.

result.csv result.txt

almondtools avatar Aug 16 '23 03:08 almondtools

Whoops, my apologies. I overlooked the command files.

hyperpape avatar Aug 17 '23 22:08 hyperpape