jflex
jflex copied to clipboard
Reduce RAM usage of the char->char class map
Reduce RAM usage of the char->char class map by using byte[] rather than char[] when the number of classes is less than 257 - see https://issues.apache.org/jira/browse/LUCENE-6913.
This could be an automatic optimization, no option required.
An alternative solution could be to only tableize latin1 or the bmp (int[128] or int[64k]) and use binary search for supplementary characters.
For example:
if (cp < table.length) {
// fast path, direct lookup from table
charClass = table[cp];
} else {
// binary search CharClassInterval to get charclass index
charClass = classes[Arrays.binarySearch(rangeStarts, cp)];
}
Basically the two approaches you see here in lucene: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/automaton/RunAutomaton.java#L179-L185
It may be an overall performance win, reduction of huge tables is more friendly on cpu caches, and at least in theory the bounds check on table.length is already happening today and the compiler should be able to fold it in.
Ok, so after implementing it in #1043, and measuring performance, we do get a performance degradation of about 15% percent. I'm not entirely sure of the cause, but it looks like https://github.com/jflex-de/jflex/pull/697 has already achieved sufficient memory reduction and cache locality.
Using byte or char as value type instead of int seems to have secondary effects, maybe too many casting operations (in the VM) or unaligned accesses in memory.
Current (on 4bd9811e) performance with int is:
Benchmark (factor) (input) Mode Cnt Score Error Units
JFlexBench.baselineReader 100 1 avgt 5 0.087 ± 0.002 ms/op
JFlexBench.baselineReader 100 2 avgt 5 0.088 ± 0.001 ms/op
JFlexBench.baselineReader 1000 1 avgt 5 0.876 ± 0.047 ms/op
JFlexBench.baselineReader 1000 2 avgt 5 0.885 ± 0.026 ms/op
JFlexBench.baselineReader 10000 1 avgt 5 9.197 ± 0.299 ms/op
JFlexBench.baselineReader 10000 2 avgt 5 9.487 ± 0.321 ms/op
JFlexBench.noAction17Lexer 100 1 avgt 5 0.182 ± 0.003 ms/op
JFlexBench.noAction17Lexer 100 2 avgt 5 0.178 ± 0.002 ms/op
JFlexBench.noAction17Lexer 1000 1 avgt 5 1.859 ± 0.088 ms/op
JFlexBench.noAction17Lexer 1000 2 avgt 5 1.734 ± 0.031 ms/op
JFlexBench.noAction17Lexer 10000 1 avgt 5 17.997 ± 0.189 ms/op
JFlexBench.noAction17Lexer 10000 2 avgt 5 18.633 ± 2.083 ms/op
JFlexBench.noAction18Lexer 100 1 avgt 5 0.174 ± 0.001 ms/op
JFlexBench.noAction18Lexer 100 2 avgt 5 0.171 ± 0.001 ms/op
JFlexBench.noAction18Lexer 1000 1 avgt 5 1.726 ± 0.015 ms/op
JFlexBench.noAction18Lexer 1000 2 avgt 5 1.697 ± 0.018 ms/op
JFlexBench.noAction18Lexer 10000 1 avgt 5 17.928 ± 0.424 ms/op
JFlexBench.noAction18Lexer 10000 2 avgt 5 17.450 ± 2.412 ms/op
JFlexBench.noActionLexer 100 1 avgt 5 0.168 ± 0.005 ms/op
JFlexBench.noActionLexer 100 2 avgt 5 0.164 ± 0.003 ms/op
JFlexBench.noActionLexer 1000 1 avgt 5 1.663 ± 0.025 ms/op
JFlexBench.noActionLexer 1000 2 avgt 5 1.640 ± 0.074 ms/op
JFlexBench.noActionLexer 10000 1 avgt 5 16.769 ± 0.520 ms/op
JFlexBench.noActionLexer 10000 2 avgt 5 16.671 ± 0.344 ms/op
baseLineReader does no matching, just reads each character from the stream, NoAction17 is generated by jflex 1.7.0, NoAction18 is generated by jflex 1.8.2, and NoAction is generated by the current snapshot jflex. Input 1 is in the 0..0xFF character range, input 2 is higher unicdoe code points. The factor multiplies the length of the input.
After https://github.com/jflex-de/jflex/pull/1043, the performance with byte (on b73ba6f03dd) is:
Benchmark (factor) (input) Mode Cnt Score Error Units
JFlexBench.baselineReader 100 1 avgt 5 0.088 ± 0.005 ms/op
JFlexBench.baselineReader 100 2 avgt 5 0.088 ± 0.002 ms/op
JFlexBench.baselineReader 1000 1 avgt 5 0.883 ± 0.037 ms/op
JFlexBench.baselineReader 1000 2 avgt 5 0.884 ± 0.032 ms/op
JFlexBench.baselineReader 10000 1 avgt 5 9.236 ± 0.183 ms/op
JFlexBench.baselineReader 10000 2 avgt 5 9.479 ± 0.265 ms/op
JFlexBench.noAction17Lexer 100 1 avgt 5 0.184 ± 0.005 ms/op
JFlexBench.noAction17Lexer 100 2 avgt 5 0.180 ± 0.005 ms/op
JFlexBench.noAction17Lexer 1000 1 avgt 5 1.873 ± 0.077 ms/op
JFlexBench.noAction17Lexer 1000 2 avgt 5 1.782 ± 0.025 ms/op
JFlexBench.noAction17Lexer 10000 1 avgt 5 18.395 ± 0.781 ms/op
JFlexBench.noAction17Lexer 10000 2 avgt 5 18.045 ± 0.207 ms/op
JFlexBench.noAction18Lexer 100 1 avgt 5 0.177 ± 0.003 ms/op
JFlexBench.noAction18Lexer 100 2 avgt 5 0.173 ± 0.002 ms/op
JFlexBench.noAction18Lexer 1000 1 avgt 5 1.742 ± 0.013 ms/op
JFlexBench.noAction18Lexer 1000 2 avgt 5 1.743 ± 0.012 ms/op
JFlexBench.noAction18Lexer 10000 1 avgt 5 17.466 ± 0.097 ms/op
JFlexBench.noAction18Lexer 10000 2 avgt 5 17.434 ± 0.038 ms/op
JFlexBench.noActionLexer 100 1 avgt 5 0.191 ± 0.001 ms/op
JFlexBench.noActionLexer 100 2 avgt 5 0.190 ± 0.004 ms/op
JFlexBench.noActionLexer 1000 1 avgt 5 1.893 ± 0.020 ms/op
JFlexBench.noActionLexer 1000 2 avgt 5 1.877 ± 0.024 ms/op
JFlexBench.noActionLexer 10000 1 avgt 5 19.602 ± 0.072 ms/op
JFlexBench.noActionLexer 10000 2 avgt 5 18.890 ± 0.473 ms/op
This is noticeably slower. Forcing char instead of byte for the same lexer provides almost the same result:
Benchmark (factor) (input) Mode Cnt Score Error Units
JFlexBench.baselineReader 100 1 avgt 5 0.088 ± 0.002 ms/op
JFlexBench.baselineReader 100 2 avgt 5 0.088 ± 0.001 ms/op
JFlexBench.baselineReader 1000 1 avgt 5 0.886 ± 0.034 ms/op
JFlexBench.baselineReader 1000 2 avgt 5 0.877 ± 0.015 ms/op
JFlexBench.baselineReader 10000 1 avgt 5 9.108 ± 0.415 ms/op
JFlexBench.baselineReader 10000 2 avgt 5 9.380 ± 0.139 ms/op
JFlexBench.noAction17Lexer 100 1 avgt 5 0.180 ± 0.003 ms/op
JFlexBench.noAction17Lexer 100 2 avgt 5 0.173 ± 0.003 ms/op
JFlexBench.noAction17Lexer 1000 1 avgt 5 1.833 ± 0.023 ms/op
JFlexBench.noAction17Lexer 1000 2 avgt 5 1.718 ± 0.018 ms/op
JFlexBench.noAction17Lexer 10000 1 avgt 5 18.025 ± 0.184 ms/op
JFlexBench.noAction17Lexer 10000 2 avgt 5 17.803 ± 0.570 ms/op
JFlexBench.noAction18Lexer 100 1 avgt 5 0.168 ± 0.002 ms/op
JFlexBench.noAction18Lexer 100 2 avgt 5 0.164 ± 0.001 ms/op
JFlexBench.noAction18Lexer 1000 1 avgt 5 1.654 ± 0.020 ms/op
JFlexBench.noAction18Lexer 1000 2 avgt 5 1.618 ± 0.023 ms/op
JFlexBench.noAction18Lexer 10000 1 avgt 5 17.487 ± 0.124 ms/op
JFlexBench.noAction18Lexer 10000 2 avgt 5 17.545 ± 0.843 ms/op
JFlexBench.noActionLexer 100 1 avgt 5 0.190 ± 0.001 ms/op
JFlexBench.noActionLexer 100 2 avgt 5 0.185 ± 0.002 ms/op
JFlexBench.noActionLexer 1000 1 avgt 5 1.881 ± 0.021 ms/op
JFlexBench.noActionLexer 1000 2 avgt 5 1.840 ± 0.028 ms/op
JFlexBench.noActionLexer 10000 1 avgt 5 19.051 ± 0.122 ms/op
JFlexBench.noActionLexer 10000 2 avgt 5 18.664 ± 0.070 ms/op
I think with the drastic memory reduction of https://github.com/jflex-de/jflex/pull/697 memory pressure is low enough now that we can keep using int as default.
If you run into a scenario where memory consumption of the char -> char class mapping is a worse issue than a 15%-ish performance difference in the scanning engine, we could still implement #1043 as a user-controlled option, but I'll close this issue for now and wait until somebody actually runs into that kind of problem.
Do feel free to comment/re-open if you want this feature or have similar problems.