Implementation of radix trie
This PR implements radix trie that has reduced retained memory footprint. I hoped to also see some improvements in benchmarks that affect processing, but I don't see anything conclusive :shrug: Perhaps that is because of branching in the matching methods that is now a bit more complicated.
- [x] JSONPath tests are not passing, needs further investigation
- [x] Left a couple of TODOs with further optimizations
- [x] Look a bit more into why we didn't see any improvements for masking
[!NOTE] These results are affected by shared workloads on GitHub runners. Use the results only to detect possible regressions, but always rerun on more stable machine before making any conclusions!
Benchmark results (pull-request, a22beaea292ab0878c28fdf49f11c6125a46255c)
Benchmark (characters) (jsonPath) (jsonSize) (keyLength) (maskedKeyProbability) (numberOfTargetKeys) (streamInputType) (streamOutputType) Mode Cnt Score Error Units
BaselineBenchmark.countBytes unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 2587665.087 ± 151853.193 ops/s
BaselineBenchmark.countBytes:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 0.001 ± 0.001 B/op
BaselineBenchmark.jacksonParseAndMask unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 29672.694 ± 943.417 ops/s
BaselineBenchmark.jacksonParseAndMask:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 64448.071 ± 0.045 B/op
BaselineBenchmark.jacksonParseOnly unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 50272.695 ± 619.645 ops/s
BaselineBenchmark.jacksonParseOnly:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 24312.042 ± 0.028 B/op
BaselineBenchmark.regexReplace unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 5310.533 ± 45.073 ops/s
BaselineBenchmark.regexReplace:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 61656.396 ± 0.274 B/op
BaselineBenchmark.writeFile unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 4313.778 ± 726.036 ops/s
BaselineBenchmark.writeFile:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 10648.487 ± 0.378 B/op
InstanceCreationBenchmark.jsonMasker N/A N/A N/A N/A N/A 1000 N/A N/A thrpt 4 1508.685 ± 85.417 ops/s
InstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm N/A N/A N/A N/A N/A 1000 N/A N/A thrpt 4 1672691.179 ± 24.145 B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 227254.718 ± 5475.626 ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 10816.009 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 214138.090 ± 3665.709 ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 12240.009 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerBytes unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 398200.162 ± 12722.187 ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 2240.005 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerBytes unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 361469.933 ± 10073.051 ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 2072.005 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerString unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 216092.411 ± 2268.145 ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 10144.009 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerString unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 196081.607 ± 1817.209 ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 10992.010 ± 0.001 B/op
LargeKeySetInstanceCreationBenchmark.jsonMasker N/A N/A N/A 100 N/A 1000 N/A N/A thrpt 4 137.870 ± 1.591 ops/s
LargeKeySetInstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm N/A N/A N/A 100 N/A 1000 N/A N/A thrpt 4 32420278.352 ± 215.669 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A ByteArrayStream ByteArrayStream thrpt 4 256799.050 ± 4075.265 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A ByteArrayStream ByteArrayStream thrpt 4 12240.008 ± 0.007 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A ByteArrayStream FileStream thrpt 4 4409.339 ± 1264.289 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A ByteArrayStream FileStream thrpt 4 9280.479 ± 0.396 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A FileStream ByteArrayStream thrpt 4 81257.822 ± 920.347 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A FileStream ByteArrayStream thrpt 4 12368.026 ± 0.022 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A FileStream FileStream thrpt 4 4067.271 ± 1095.568 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A FileStream FileStream thrpt 4 9408.521 ± 0.517 B/op
ValueMaskerBenchmark.maskWithRawValueFunction unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 575826.299 ± 10482.694 ops/s
ValueMaskerBenchmark.maskWithRawValueFunction:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 1600.003 ± 0.001 B/op
ValueMaskerBenchmark.maskWithStatic unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 616368.698 ± 56288.392 ops/s
ValueMaskerBenchmark.maskWithStatic:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 1240.003 ± 0.001 B/op
ValueMaskerBenchmark.maskWithTextValueFunction unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 519315.122 ± 9345.862 ops/s
ValueMaskerBenchmark.maskWithTextValueFunction:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 1888.004 ± 0.001 B/op
Benchmark results (master, 007bb6ad998f6be9f9151d0e08bdc410bfed2ed9)
Benchmark (characters) (jsonPath) (jsonSize) (keyLength) (maskedKeyProbability) (numberOfTargetKeys) (streamInputType) (streamOutputType) Mode Cnt Score Error Units
BaselineBenchmark.countBytes unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 2589161.779 ± 147513.772 ops/s
BaselineBenchmark.countBytes:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 0.001 ± 0.001 B/op
BaselineBenchmark.jacksonParseAndMask unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 29378.132 ± 950.980 ops/s
BaselineBenchmark.jacksonParseAndMask:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 64272.072 ± 0.050 B/op
BaselineBenchmark.jacksonParseOnly unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 50066.705 ± 636.332 ops/s
BaselineBenchmark.jacksonParseOnly:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 24312.042 ± 0.029 B/op
BaselineBenchmark.regexReplace unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 5299.154 ± 128.770 ops/s
BaselineBenchmark.regexReplace:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 61656.397 ± 0.266 B/op
BaselineBenchmark.writeFile unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 5514.633 ± 3079.881 ops/s
BaselineBenchmark.writeFile:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 10648.381 ± 0.190 B/op
InstanceCreationBenchmark.jsonMasker N/A N/A N/A N/A N/A 1000 N/A N/A thrpt 4 676.779 ± 24.061 ops/s
InstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm N/A N/A N/A N/A N/A 1000 N/A N/A thrpt 4 2638443.754 ± 2.294 B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 246302.442 ± 10080.332 ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 10816.008 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 295133.247 ± 3081.008 ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 11560.007 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerBytes unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 426052.074 ± 9204.901 ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 2240.005 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerBytes unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 420062.796 ± 13471.776 ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 1392.005 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerString unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 227751.633 ± 3704.734 ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm unicode false 1kb N/A 0.1 N/A N/A N/A thrpt 4 10144.009 ± 0.001 B/op
JsonMaskerBenchmark.jsonMaskerString unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 212119.103 ± 3419.989 ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm unicode true 1kb N/A 0.1 N/A N/A N/A thrpt 4 10312.009 ± 0.001 B/op
LargeKeySetInstanceCreationBenchmark.jsonMasker N/A N/A N/A 100 N/A 1000 N/A N/A thrpt 4 10.815 ± 0.463 ops/s
LargeKeySetInstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm N/A N/A N/A 100 N/A 1000 N/A N/A thrpt 4 62613616.052 ± 116.600 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A ByteArrayStream ByteArrayStream thrpt 4 250395.630 ± 3329.874 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A ByteArrayStream ByteArrayStream thrpt 4 11560.008 ± 0.007 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A ByteArrayStream FileStream thrpt 4 4399.552 ± 1819.367 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A ByteArrayStream FileStream thrpt 4 8600.484 ± 0.584 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A FileStream ByteArrayStream thrpt 4 84742.483 ± 1625.590 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A FileStream ByteArrayStream thrpt 4 11688.025 ± 0.021 B/op
StreamTypeBenchmark.jsonMaskerStreams N/A N/A 1kb N/A N/A N/A FileStream FileStream thrpt 4 4291.711 ± 616.750 ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm N/A N/A 1kb N/A N/A N/A FileStream FileStream thrpt 4 8728.491 ± 0.367 B/op
ValueMaskerBenchmark.maskWithRawValueFunction unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 570716.027 ± 32104.447 ops/s
ValueMaskerBenchmark.maskWithRawValueFunction:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 1600.003 ± 0.001 B/op
ValueMaskerBenchmark.maskWithStatic unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 657577.844 ± 63065.677 ops/s
ValueMaskerBenchmark.maskWithStatic:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 1240.003 ± 0.001 B/op
ValueMaskerBenchmark.maskWithTextValueFunction unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 540648.618 ± 6476.507 ops/s
ValueMaskerBenchmark.maskWithTextValueFunction:gc.alloc.rate.norm unicode N/A 1kb N/A 0.1 N/A N/A N/A thrpt 4 1888.004 ± 0.001 B/op
Quality Gate failed
Failed conditions
C Reliability Rating on New Code (required ≥ A)
See analysis details on SonarCloud
Catch issues before they fail your Quality Gate with our IDE extension
SonarLint
Was testing on of the benchmarks locally, the key matching actually got faster by quite a bit:
Before:
Benchmark (caseSensitive) (mode) Mode Cnt Score Error Units
KeyMatcherBenchmark.matchAllKeys false mask thrpt 2 16196074.837 ops/s
KeyMatcherBenchmark.matchAllKeys false allow thrpt 2 16032336.619 ops/s
KeyMatcherBenchmark.matchAllKeys true mask thrpt 2 13108860.366 ops/s
KeyMatcherBenchmark.matchAllKeys true allow thrpt 2 13075189.300 ops/s
After:
Benchmark (caseSensitive) (mode) Mode Cnt Score Error Units
KeyMatcherBenchmark.matchAllKeys false mask thrpt 2 20520848.607 ops/s
KeyMatcherBenchmark.matchAllKeys false allow thrpt 2 20466876.402 ops/s
KeyMatcherBenchmark.matchAllKeys true mask thrpt 2 20085632.774 ops/s
KeyMatcherBenchmark.matchAllKeys true allow thrpt 2 19213484.306 ops/s
so it looks like the reason we don't see much improvement on the existing benchmarks is because we have maskedKeyProbability = 0.1 and we just don't have a lot of things to mask. Will include this benchmark in the PR.
Quality Gate passed
Issues
3 New issues
0 Accepted issues
Measures
0 Security Hotspots
99.1% Coverage on New Code
0.0% Duplication on New Code
@gavlyukovskiy ignore the comments and last commit for now, I am applying the comments myself and want to play around with it a bit more adding lower level unit tests and some JavaDoc
Quality Gate passed
Issues
3 New issues
0 Accepted issues
Measures
0 Security Hotspots
95.1% Coverage on New Code
0.0% Duplication on New Code