joni icon indicating copy to clipboard operation
joni copied to clipboard

joni seems to be 1.5 slower than simple JNI bindings

Open denofevil opened this issue 5 years ago • 1 comments

Steps to reproduce

  1. onig4j-v003-src.zip
  2. Update jni/Makefile with proper JAVA_HOME and then call make
  3. Update lib location in src/onig4j/OnigRegex.java
  4. Run OnigPerformanceTest

We've got following results: java: 4261ms joni: 5798ms onig: 3511ms tm4e: 18ms

With a straightforward approach joni is about 1.5 times slower than oniguruma bindings.

tm4e major boost seems to be a result of src/org/eclipse/tm4e/core/internal/oniguruma/OnigRegExp.java:49: if a regexp is called consequently on the same string it just returns latest cached match result

denofevil avatar Jul 11 '19 13:07 denofevil

This is obviously ancient (sorry about that) yet still makes an interesting suggestion. The code highlighted is:

    public OnigResult Search(OnigString str, int position) {
        if (lastSearchStrUniqueId == str.uniqueId() && lastSearchPosition <= position) {
            if (lastSearchResult == null || lastSearchResult.LocationAt(0) >= position) {
                return lastSearchResult;
            }
        }

        lastSearchStrUniqueId = str.uniqueId();
        lastSearchPosition = position;
        lastSearchResult = Search(str.utf8_value(), position, str.utf8_length());
        return lastSearchResult;
    }

In looking at the benchmark it seems it creates a regexp cache which when _findNextMatchSync happens it basically kept the search for that regexp around so it can they notice it is the same result and then have a cache hit. We (JRuby) cache joni regexps but not results above joni itself.

Perhaps there is value in caching results in Joni? I think C Ruby added some result cache. I will try and see if they have an interesting data to back up how often this happens. This again may be more useful above joni than in it but I can see how having it in it could help more projects and not force them to each do their own caching.

enebo avatar Mar 01 '23 15:03 enebo