joni
joni copied to clipboard
joni seems to be 1.5 slower than simple JNI bindings
Steps to reproduce
- onig4j-v003-src.zip
- Update jni/Makefile with proper
JAVA_HOME
and then callmake
- Update lib location in
src/onig4j/OnigRegex.java
- Run
OnigPerformanceTest
We've got following results: java: 4261ms joni: 5798ms onig: 3511ms tm4e: 18ms
With a straightforward approach joni is about 1.5 times slower than oniguruma bindings.
tm4e major boost seems to be a result of src/org/eclipse/tm4e/core/internal/oniguruma/OnigRegExp.java:49
: if a regexp is called consequently on the same string it just returns latest cached match result
This is obviously ancient (sorry about that) yet still makes an interesting suggestion. The code highlighted is:
public OnigResult Search(OnigString str, int position) {
if (lastSearchStrUniqueId == str.uniqueId() && lastSearchPosition <= position) {
if (lastSearchResult == null || lastSearchResult.LocationAt(0) >= position) {
return lastSearchResult;
}
}
lastSearchStrUniqueId = str.uniqueId();
lastSearchPosition = position;
lastSearchResult = Search(str.utf8_value(), position, str.utf8_length());
return lastSearchResult;
}
In looking at the benchmark it seems it creates a regexp cache which when _findNextMatchSync happens it basically kept the search for that regexp around so it can they notice it is the same result and then have a cache hit. We (JRuby) cache joni regexps but not results above joni itself.
Perhaps there is value in caching results in Joni? I think C Ruby added some result cache. I will try and see if they have an interesting data to back up how often this happens. This again may be more useful above joni than in it but I can see how having it in it could help more projects and not force them to each do their own caching.