jflex icon indicating copy to clipboard operation
jflex copied to clipboard

Reduce RAM usage of the char->char class map

Open madrob opened this issue 9 years ago • 1 comments
trafficstars

Reduce RAM usage of the char->char class map by using byte[] rather than char[] when the number of classes is less than 257 - see https://issues.apache.org/jira/browse/LUCENE-6913.

This could be an automatic optimization, no option required.

madrob avatar Apr 01 '16 20:04 madrob

An alternative solution could be to only tableize latin1 or the bmp (int[128] or int[64k]) and use binary search for supplementary characters.

For example:

if (cp < table.length) {
   // fast path, direct lookup from table
   charClass = table[cp];
} else {
   // binary search CharClassInterval to get charclass index
   charClass = classes[Arrays.binarySearch(rangeStarts, cp)];
}

Basically the two approaches you see here in lucene: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/automaton/RunAutomaton.java#L179-L185

It may be an overall performance win, reduction of huge tables is more friendly on cpu caches, and at least in theory the bounds check on table.length is already happening today and the compiler should be able to fold it in.

rmuir avatar Aug 22 '16 14:08 rmuir

Ok, so after implementing it in #1043, and measuring performance, we do get a performance degradation of about 15% percent. I'm not entirely sure of the cause, but it looks like https://github.com/jflex-de/jflex/pull/697 has already achieved sufficient memory reduction and cache locality.

Using byte or char as value type instead of int seems to have secondary effects, maybe too many casting operations (in the VM) or unaligned accesses in memory.

Current (on 4bd9811e) performance with int is:

Benchmark                   (factor)  (input)  Mode  Cnt   Score    Error  Units
JFlexBench.baselineReader        100        1  avgt    5   0.087 ± 0.002  ms/op
JFlexBench.baselineReader        100        2  avgt    5   0.088 ± 0.001  ms/op
JFlexBench.baselineReader       1000        1  avgt    5   0.876 ± 0.047  ms/op
JFlexBench.baselineReader       1000        2  avgt    5   0.885 ± 0.026  ms/op
JFlexBench.baselineReader      10000        1  avgt    5   9.197 ± 0.299  ms/op
JFlexBench.baselineReader      10000        2  avgt    5   9.487 ± 0.321  ms/op
JFlexBench.noAction17Lexer       100        1  avgt    5   0.182 ± 0.003  ms/op
JFlexBench.noAction17Lexer       100        2  avgt    5   0.178 ± 0.002  ms/op
JFlexBench.noAction17Lexer      1000        1  avgt    5   1.859 ± 0.088  ms/op
JFlexBench.noAction17Lexer      1000        2  avgt    5   1.734 ± 0.031  ms/op
JFlexBench.noAction17Lexer     10000        1  avgt    5  17.997 ± 0.189  ms/op
JFlexBench.noAction17Lexer     10000        2  avgt    5  18.633 ± 2.083  ms/op
JFlexBench.noAction18Lexer       100        1  avgt    5   0.174 ± 0.001  ms/op
JFlexBench.noAction18Lexer       100        2  avgt    5   0.171 ± 0.001  ms/op
JFlexBench.noAction18Lexer      1000        1  avgt    5   1.726 ± 0.015  ms/op
JFlexBench.noAction18Lexer      1000        2  avgt    5   1.697 ± 0.018  ms/op
JFlexBench.noAction18Lexer     10000        1  avgt    5  17.928 ± 0.424  ms/op
JFlexBench.noAction18Lexer     10000        2  avgt    5  17.450 ± 2.412  ms/op
JFlexBench.noActionLexer         100        1  avgt    5   0.168 ± 0.005  ms/op
JFlexBench.noActionLexer         100        2  avgt    5   0.164 ± 0.003  ms/op
JFlexBench.noActionLexer        1000        1  avgt    5   1.663 ± 0.025  ms/op
JFlexBench.noActionLexer        1000        2  avgt    5   1.640 ± 0.074  ms/op
JFlexBench.noActionLexer       10000        1  avgt    5  16.769 ± 0.520  ms/op
JFlexBench.noActionLexer       10000        2  avgt    5  16.671 ± 0.344  ms/op

baseLineReader does no matching, just reads each character from the stream, NoAction17 is generated by jflex 1.7.0, NoAction18 is generated by jflex 1.8.2, and NoAction is generated by the current snapshot jflex. Input 1 is in the 0..0xFF character range, input 2 is higher unicdoe code points. The factor multiplies the length of the input.

After https://github.com/jflex-de/jflex/pull/1043, the performance with byte (on b73ba6f03dd) is:

Benchmark                   (factor)  (input)  Mode  Cnt   Score    Error  Units
JFlexBench.baselineReader        100        1  avgt    5   0.088 ± 0.005  ms/op
JFlexBench.baselineReader        100        2  avgt    5   0.088 ± 0.002  ms/op
JFlexBench.baselineReader       1000        1  avgt    5   0.883 ± 0.037  ms/op
JFlexBench.baselineReader       1000        2  avgt    5   0.884 ± 0.032  ms/op
JFlexBench.baselineReader      10000        1  avgt    5   9.236 ± 0.183  ms/op
JFlexBench.baselineReader      10000        2  avgt    5   9.479 ± 0.265  ms/op
JFlexBench.noAction17Lexer       100        1  avgt    5   0.184 ± 0.005  ms/op
JFlexBench.noAction17Lexer       100        2  avgt    5   0.180 ± 0.005  ms/op
JFlexBench.noAction17Lexer      1000        1  avgt    5   1.873 ± 0.077  ms/op
JFlexBench.noAction17Lexer      1000        2  avgt    5   1.782 ± 0.025  ms/op
JFlexBench.noAction17Lexer     10000        1  avgt    5  18.395 ± 0.781  ms/op
JFlexBench.noAction17Lexer     10000        2  avgt    5  18.045 ± 0.207  ms/op
JFlexBench.noAction18Lexer       100        1  avgt    5   0.177 ± 0.003  ms/op
JFlexBench.noAction18Lexer       100        2  avgt    5   0.173 ± 0.002  ms/op
JFlexBench.noAction18Lexer      1000        1  avgt    5   1.742 ± 0.013  ms/op
JFlexBench.noAction18Lexer      1000        2  avgt    5   1.743 ± 0.012  ms/op
JFlexBench.noAction18Lexer     10000        1  avgt    5  17.466 ± 0.097  ms/op
JFlexBench.noAction18Lexer     10000        2  avgt    5  17.434 ± 0.038  ms/op
JFlexBench.noActionLexer         100        1  avgt    5   0.191 ± 0.001  ms/op
JFlexBench.noActionLexer         100        2  avgt    5   0.190 ± 0.004  ms/op
JFlexBench.noActionLexer        1000        1  avgt    5   1.893 ± 0.020  ms/op
JFlexBench.noActionLexer        1000        2  avgt    5   1.877 ± 0.024  ms/op
JFlexBench.noActionLexer       10000        1  avgt    5  19.602 ± 0.072  ms/op
JFlexBench.noActionLexer       10000        2  avgt    5  18.890 ± 0.473  ms/op

This is noticeably slower. Forcing char instead of byte for the same lexer provides almost the same result:

Benchmark                   (factor)  (input)  Mode  Cnt   Score    Error  Units
JFlexBench.baselineReader        100        1  avgt    5   0.088 ±  0.002  ms/op
JFlexBench.baselineReader        100        2  avgt    5   0.088 ±  0.001  ms/op
JFlexBench.baselineReader       1000        1  avgt    5   0.886 ±  0.034  ms/op
JFlexBench.baselineReader       1000        2  avgt    5   0.877 ±  0.015  ms/op
JFlexBench.baselineReader      10000        1  avgt    5   9.108 ±  0.415  ms/op
JFlexBench.baselineReader      10000        2  avgt    5   9.380 ±  0.139  ms/op
JFlexBench.noAction17Lexer       100        1  avgt    5   0.180 ±  0.003  ms/op
JFlexBench.noAction17Lexer       100        2  avgt    5   0.173 ±  0.003  ms/op
JFlexBench.noAction17Lexer      1000        1  avgt    5   1.833 ±  0.023  ms/op
JFlexBench.noAction17Lexer      1000        2  avgt    5   1.718 ±  0.018  ms/op
JFlexBench.noAction17Lexer     10000        1  avgt    5  18.025 ±  0.184  ms/op
JFlexBench.noAction17Lexer     10000        2  avgt    5  17.803 ±  0.570  ms/op
JFlexBench.noAction18Lexer       100        1  avgt    5   0.168 ±  0.002  ms/op
JFlexBench.noAction18Lexer       100        2  avgt    5   0.164 ±  0.001  ms/op
JFlexBench.noAction18Lexer      1000        1  avgt    5   1.654 ±  0.020  ms/op
JFlexBench.noAction18Lexer      1000        2  avgt    5   1.618 ±  0.023  ms/op
JFlexBench.noAction18Lexer     10000        1  avgt    5  17.487 ±  0.124  ms/op
JFlexBench.noAction18Lexer     10000        2  avgt    5  17.545 ±  0.843  ms/op
JFlexBench.noActionLexer         100        1  avgt    5   0.190 ±  0.001  ms/op
JFlexBench.noActionLexer         100        2  avgt    5   0.185 ±  0.002  ms/op
JFlexBench.noActionLexer        1000        1  avgt    5   1.881 ±  0.021  ms/op
JFlexBench.noActionLexer        1000        2  avgt    5   1.840 ±  0.028  ms/op
JFlexBench.noActionLexer       10000        1  avgt    5  19.051 ±  0.122  ms/op
JFlexBench.noActionLexer       10000        2  avgt    5  18.664 ±  0.070  ms/op

lsf37 avatar Jan 22 '23 22:01 lsf37

I think with the drastic memory reduction of https://github.com/jflex-de/jflex/pull/697 memory pressure is low enough now that we can keep using int as default.

If you run into a scenario where memory consumption of the char -> char class mapping is a worse issue than a 15%-ish performance difference in the scanning engine, we could still implement #1043 as a user-controlled option, but I'll close this issue for now and wait until somebody actually runs into that kind of problem.

Do feel free to comment/re-open if you want this feature or have similar problems.

lsf37 avatar Jan 22 '23 22:01 lsf37