druid
druid copied to clipboard
add support for 'front coded' string dictionaries for smaller string columns
Closes #3922
Description
This PR adds a new way of storing STRING
typed columns, using an incremental encoding strategy called 'front coding'. Essentially, sorted values are split into buckets; the first value in the bucket is written completely and the remaining values in the bucket are stored as a pair, composed of the length of the first string which overlaps this value (the prefix length), and the remaining fragment of the value that remains after the prefix. Using the quickstart wikipedia example, this results in nicely reduced segment sizes, with relatively little performance penalty in most typical Druid queries.
Segment Size and Performance
Wikipedia "quickstart" segments:
GenericIndexed<ByteBuffer>
"singleThreaded" vs FrontCodedIndexed
with bucket size 4 and size 16:
Benchmark (indexType) (numElements) (numOperations) (width) Mode Cnt Score Error Units
FrontCodedIndexedBenchmark.get generic 10000 10000 16 avgt 5 12.842 ± 0.309 ns/op
FrontCodedIndexedBenchmark.get generic 100000 10000 16 avgt 5 14.173 ± 0.259 ns/op
FrontCodedIndexedBenchmark.get front-coded-4 10000 10000 16 avgt 5 39.418 ± 0.989 ns/op
FrontCodedIndexedBenchmark.get front-coded-4 100000 10000 16 avgt 5 43.614 ± 1.081 ns/op
FrontCodedIndexedBenchmark.get front-coded-16 10000 10000 16 avgt 5 68.667 ± 0.674 ns/op
FrontCodedIndexedBenchmark.get front-coded-16 100000 10000 16 avgt 5 72.695 ± 0.170 ns/op
FrontCodedIndexedBenchmark.indexOf generic 10000 10000 16 avgt 5 161.167 ± 1.945 ns/op
FrontCodedIndexedBenchmark.indexOf generic 100000 10000 16 avgt 5 229.376 ± 4.963 ns/op
FrontCodedIndexedBenchmark.indexOf front-coded-4 10000 10000 16 avgt 5 243.259 ± 0.290 ns/op
FrontCodedIndexedBenchmark.indexOf front-coded-4 100000 10000 16 avgt 5 377.807 ± 5.362 ns/op
FrontCodedIndexedBenchmark.indexOf front-coded-16 10000 10000 16 avgt 5 252.050 ± 0.786 ns/op
FrontCodedIndexedBenchmark.indexOf front-coded-16 100000 10000 16 avgt 5 381.047 ± 4.761 ns/op
FrontCodedIndexedBenchmark.iterator generic 10000 10000 16 avgt 5 39.341 ± 0.623 ns/op
FrontCodedIndexedBenchmark.iterator generic 100000 10000 16 avgt 5 384.242 ± 0.899 ns/op
FrontCodedIndexedBenchmark.iterator front-coded-4 10000 10000 16 avgt 5 95.082 ± 18.971 ns/op
FrontCodedIndexedBenchmark.iterator front-coded-4 100000 10000 16 avgt 5 690.636 ± 27.592 ns/op
FrontCodedIndexedBenchmark.iterator front-coded-16 10000 10000 16 avgt 5 77.685 ± 29.205 ns/op
FrontCodedIndexedBenchmark.iterator front-coded-16 100000 10000 16 avgt 5 606.071 ± 25.713 ns/op
FrontCodedIndexedBenchmark.get:encoded size generic 10000 10000 16 avgt 5 240010.000 bytes
FrontCodedIndexedBenchmark.get:encoded size generic 100000 10000 16 avgt 5 2400010.000 bytes
FrontCodedIndexedBenchmark.get:encoded size front-coded-4 10000 10000 16 avgt 5 169992.000 bytes
FrontCodedIndexedBenchmark.get:encoded size front-coded-4 100000 10000 16 avgt 5 1636900.000 bytes
FrontCodedIndexedBenchmark.get:encoded size front-coded-16 10000 10000 16 avgt 5 164187.000 bytes
FrontCodedIndexedBenchmark.get:encoded size front-coded-16 100000 10000 16 avgt 5 1564828.000 bytes
How this translates into queries has so far been measured with SqlBenchmark
, though it's generated data-set does pretty poorly with this encoding due to being composed of numbers which have been translated directly into strings, leaving few prefixes to take advantage of (383MB with generic indexed compared to 381MB with front-coded indexed). This means that any size advantage is not present here and so this is likely near worst case for this encoding strategy.
SELECT SUM(sumLongSequential), SUM(sumFloatNormal) FROM foo WHERE dimSequential NOT LIKE '%3'
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 4 5000000 mmap none false avgt 5 94.069 ± 1.454 ms/op
SqlBenchmark.querySql 4 5000000 mmap none force avgt 5 45.142 ± 1.205 ms/op
SqlBenchmark.querySql 4 5000000 mmap front-coded-4 false avgt 5 90.089 ± 1.271 ms/op
SqlBenchmark.querySql 4 5000000 mmap front-coded-4 force avgt 5 44.842 ± 1.130 ms/op
SqlBenchmark.querySql 4 5000000 mmap front-coded-16 false avgt 5 90.121 ± 1.020 ms/op
SqlBenchmark.querySql 4 5000000 mmap front-coded-16 force avgt 5 44.965 ± 1.256 ms/op
-------
SELECT SUM(sumLongSequential), SUM(sumFloatNormal) FROM foo WHERE dimSequential = '311'
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 5 5000000 mmap none false avgt 5 23.650 ± 0.605 ms/op
SqlBenchmark.querySql 5 5000000 mmap none force avgt 5 22.420 ± 0.496 ms/op
SqlBenchmark.querySql 5 5000000 mmap front-coded-4 false avgt 5 23.756 ± 0.766 ms/op
SqlBenchmark.querySql 5 5000000 mmap front-coded-4 force avgt 5 22.045 ± 0.551 ms/op
SqlBenchmark.querySql 5 5000000 mmap front-coded-16 false avgt 5 23.671 ± 0.603 ms/op
SqlBenchmark.querySql 5 5000000 mmap front-coded-16 force avgt 5 22.044 ± 0.638 ms/op
-------
SELECT SUM(sumLongSequential), SUM(sumFloatNormal) FROM foo WHERE dimSequential NOT LIKE '%3' AND maxLongUniform > 10
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 6 5000000 mmap none false avgt 5 48.876 ± 1.715 ms/op
SqlBenchmark.querySql 6 5000000 mmap none force avgt 5 23.413 ± 0.713 ms/op
SqlBenchmark.querySql 6 5000000 mmap front-coded-4 false avgt 5 48.864 ± 1.748 ms/op
SqlBenchmark.querySql 6 5000000 mmap front-coded-4 force avgt 5 23.434 ± 0.901 ms/op
SqlBenchmark.querySql 6 5000000 mmap front-coded-16 false avgt 5 48.821 ± 1.737 ms/op
SqlBenchmark.querySql 6 5000000 mmap front-coded-16 force avgt 5 23.495 ± 0.684 ms/op
-------
SELECT
SUM(sumLongSequential) FILTER(WHERE dimSequential = '311'),
SUM(sumFloatNormal)
FROM foo WHERE dimSequential NOT LIKE '%3'
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 7 5000000 mmap none false avgt 5 96.444 ± 3.474 ms/op
SqlBenchmark.querySql 7 5000000 mmap none force avgt 5 51.797 ± 1.244 ms/op
SqlBenchmark.querySql 7 5000000 mmap front-coded-4 false avgt 5 96.150 ± 3.059 ms/op
SqlBenchmark.querySql 7 5000000 mmap front-coded-4 force avgt 5 51.678 ± 0.869 ms/op
SqlBenchmark.querySql 7 5000000 mmap front-coded-16 false avgt 5 107.410 ± 2.924 ms/op
SqlBenchmark.querySql 7 5000000 mmap front-coded-16 force avgt 5 51.833 ± 1.156 ms/op
-------
(timeseries bunch of filtered aggs)
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 8 5000000 mmap none false avgt 5 893.378 ± 3.220 ms/op
SqlBenchmark.querySql 8 5000000 mmap none force avgt 5 243.069 ± 3.936 ms/op
SqlBenchmark.querySql 8 5000000 mmap front-coded-4 false avgt 5 867.778 ± 13.031 ms/op
SqlBenchmark.querySql 8 5000000 mmap front-coded-4 force avgt 5 340.799 ± 4.697 ms/op
SqlBenchmark.querySql 8 5000000 mmap front-coded-16 false avgt 5 868.694 ± 18.088 ms/op
SqlBenchmark.querySql 8 5000000 mmap front-coded-16 force avgt 5 352.819 ± 2.469 ms/op
-------
SELECT dimSequential, dimZipf, SUM(sumLongSequential) FROM foo GROUP BY 1, 2
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 10 5000000 mmap none false avgt 5 430.592 ± 34.032 ms/op
SqlBenchmark.querySql 10 5000000 mmap none force avgt 5 235.971 ± 3.405 ms/op
SqlBenchmark.querySql 10 5000000 mmap front-coded-4 false avgt 5 426.965 ± 9.557 ms/op
SqlBenchmark.querySql 10 5000000 mmap front-coded-4 force avgt 5 247.754 ± 2.635 ms/op
SqlBenchmark.querySql 10 5000000 mmap front-coded-16 false avgt 5 434.664 ± 10.247 ms/op
SqlBenchmark.querySql 10 5000000 mmap front-coded-16 force avgt 5 250.683 ± 3.340 ms/op
-------
SELECT dimSequential, dimZipf, SUM(sumLongSequential), COUNT(*) FROM foo GROUP BY 1, 2
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 11 5000000 mmap none false avgt 5 433.815 ± 7.673 ms/op
SqlBenchmark.querySql 11 5000000 mmap none force avgt 5 255.151 ± 1.853 ms/op
SqlBenchmark.querySql 11 5000000 mmap front-coded-4 false avgt 5 442.550 ± 9.269 ms/op
SqlBenchmark.querySql 11 5000000 mmap front-coded-4 force avgt 5 263.782 ± 3.357 ms/op
SqlBenchmark.querySql 11 5000000 mmap front-coded-16 false avgt 5 441.626 ± 8.954 ms/op
SqlBenchmark.querySql 11 5000000 mmap front-coded-16 force avgt 5 268.237 ± 2.834 ms/op
-------
SELECT dimZipf FROM foo GROUP BY 1
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 12 5000000 mmap none false avgt 5 42.031 ± 1.408 ms/op
SqlBenchmark.querySql 12 5000000 mmap none force avgt 5 24.031 ± 0.881 ms/op
SqlBenchmark.querySql 12 5000000 mmap front-coded-4 false avgt 5 42.393 ± 0.648 ms/op
SqlBenchmark.querySql 12 5000000 mmap front-coded-4 force avgt 5 24.055 ± 1.008 ms/op
SqlBenchmark.querySql 12 5000000 mmap front-coded-16 false avgt 5 42.255 ± 1.134 ms/op
SqlBenchmark.querySql 12 5000000 mmap front-coded-16 force avgt 5 22.201 ± 1.120 ms/op
-------
(big union)
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 19 5000000 mmap none false avgt 5 365.313 ± 5.432 ms/op
SqlBenchmark.querySql 19 5000000 mmap none force avgt 5 315.686 ± 6.720 ms/op
SqlBenchmark.querySql 19 5000000 mmap front-coded-4 false avgt 5 477.117 ± 67.235 ms/op
SqlBenchmark.querySql 19 5000000 mmap front-coded-4 force avgt 5 404.028 ± 8.931 ms/op
SqlBenchmark.querySql 19 5000000 mmap front-coded-16 false avgt 5 489.550 ± 11.902 ms/op
SqlBenchmark.querySql 19 5000000 mmap front-coded-16 force avgt 5 443.758 ± 9.031 ms/op
-------
SELECT dimSequential, dimZipf, SUM(sumLongSequential) FROM foo WHERE dimUniform NOT LIKE '%3' GROUP BY 1, 2
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 21 5000000 mmap none false avgt 5 444.995 ± 5.940 ms/op
SqlBenchmark.querySql 21 5000000 mmap none force avgt 5 273.133 ± 1.405 ms/op
SqlBenchmark.querySql 21 5000000 mmap front-coded-4 false avgt 5 450.684 ± 13.259 ms/op
SqlBenchmark.querySql 21 5000000 mmap front-coded-4 force avgt 5 279.336 ± 2.098 ms/op
SqlBenchmark.querySql 21 5000000 mmap front-coded-16 false avgt 5 456.444 ± 9.268 ms/op
SqlBenchmark.querySql 21 5000000 mmap front-coded-16 force avgt 5 283.369 ± 2.217 ms/op
-------
SELECT dimZipf, SUM(sumLongSequential) FROM foo WHERE dimSequential = '311' GROUP BY 1 ORDER BY 1
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 22 5000000 mmap none false avgt 5 15.510 ± 0.522 ms/op
SqlBenchmark.querySql 22 5000000 mmap none force avgt 5 15.380 ± 0.475 ms/op
SqlBenchmark.querySql 22 5000000 mmap front-coded-4 false avgt 5 15.565 ± 0.652 ms/op
SqlBenchmark.querySql 22 5000000 mmap front-coded-4 force avgt 5 15.372 ± 0.504 ms/op
SqlBenchmark.querySql 22 5000000 mmap front-coded-16 false avgt 5 15.544 ± 0.667 ms/op
SqlBenchmark.querySql 22 5000000 mmap front-coded-16 force avgt 5 15.496 ± 0.441 ms/op
-------
SELECT * FROM foo
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 23 5000000 mmap none force avgt 5 11097.105 ± 40.113 ms/op
SqlBenchmark.querySql 23 5000000 mmap front-coded-4 force avgt 5 11968.038 ± 125.990 ms/op
SqlBenchmark.querySql 23 5000000 mmap front-coded-16 force avgt 5 13985.780 ± 74.495 ms/op
-------
SELECT * FROM foo WHERE dimSequential IN ('1', '2', '3', '4', '5', '10', '11', '20', '21', '23', '40', '50', '64', '70', '100')
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 24 5000000 mmap none force avgt 5 255.461 ± 2.556 ms/op
SqlBenchmark.querySql 24 5000000 mmap front-coded-4 force avgt 5 258.501 ± 2.187 ms/op
SqlBenchmark.querySql 24 5000000 mmap front-coded-16 force avgt 5 298.738 ± 2.915 ms/op
-------
SELECT * FROM foo WHERE dimSequential > '10' AND dimSequential < '8500'
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 25 5000000 mmap none force avgt 5 9171.879 ± 25.495 ms/op
SqlBenchmark.querySql 25 5000000 mmap front-coded-4 force avgt 5 8819.167 ± 50.877 ms/op
SqlBenchmark.querySql 25 5000000 mmap front-coded-16 force avgt 5 10618.384 ± 124.553 ms/op
-------
SELECT dimSequential, dimZipf, SUM(sumLongSequential) FROM foo WHERE dimSequential IN ('1', '2', '3', '4', '5', '10', '11', '20', '21', '23', '40', '50', '64', '70', '100') GROUP BY 1, 2
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 26 5000000 mmap none false avgt 5 22.775 ± 0.811 ms/op
SqlBenchmark.querySql 26 5000000 mmap none force avgt 5 18.762 ± 0.654 ms/op
SqlBenchmark.querySql 26 5000000 mmap front-coded-4 false avgt 5 22.925 ± 0.772 ms/op
SqlBenchmark.querySql 26 5000000 mmap front-coded-4 force avgt 5 18.920 ± 0.610 ms/op
SqlBenchmark.querySql 26 5000000 mmap front-coded-16 false avgt 5 22.989 ± 0.813 ms/op
SqlBenchmark.querySql 26 5000000 mmap front-coded-16 force avgt 5 18.989 ± 0.571 ms/op
-------
SELECT dimSequential, dimZipf, SUM(sumLongSequential) FROM foo WHERE dimSequential > '10' AND dimSequential < '8500' GROUP BY 1, 2
Benchmark (query) (rowsPerSegment) (storageType) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 27 5000000 mmap none false avgt 5 377.893 ± 13.541 ms/op
SqlBenchmark.querySql 27 5000000 mmap none force avgt 5 228.251 ± 2.652 ms/op
SqlBenchmark.querySql 27 5000000 mmap front-coded-4 false avgt 5 387.915 ± 11.770 ms/op
SqlBenchmark.querySql 27 5000000 mmap front-coded-4 force avgt 5 234.894 ± 2.848 ms/op
SqlBenchmark.querySql 27 5000000 mmap front-coded-16 false avgt 5 400.435 ± 10.410 ms/op
SqlBenchmark.querySql 27 5000000 mmap front-coded-16 force avgt 5 237.497 ± 2.143 ms/op
I have yet to perform larger scale testing on actual clusters with real workloads, but the benchmark results look very promising at this point and show very little overhead at query time, with a decent chance the reduced segment sizes will more than make up for it.
Design
The encoding itself is done within a new Indexed
implementation, FrontCodedIndexed
, which contains all of the methods for reading and writing the buckets of values. I adapted 'variable byte' encoding for integer values from JavaFastPFOR to write both string value lengths as well as prefix lengths.
string layout:
length | value |
---|---|
vbyte int | byte[] |
bucket layout:
first string | prefix length | string fragment | ... | prefix length | string fragment |
---|---|---|---|---|---|
byte[] | vbyte int | byte[] | ... | vbyte int | byte[] |
front coded indexed layout:
version | bucket size | has null? | number of values | size of "offsets" + "buckets" | "offsets" | "buckets" |
---|---|---|---|---|---|---|
byte | byte | byte | vbyte int | vbyte int | int[] | bucket[] |
Note that the "offsets" store the starting location of all buckets beyond the first bucket (whose offset is known to be the end of the "offsets" position). The "offsets" are stored as plain integer values instead of vbyte encoded to allow for fast access of the bucket positions, but are probably a good candidate for delta encoded byte packing to further decrease their size.
Using it
This functionality can be utilized by a new property to IndexSpec
, stringDictionaryEncoding
, which can be set to {"type":"front-coded", "bucketSize": 4}
or {"type":"front-coded", "bucketSize": 16}
or something similar, to instruct indexing tasks to write segments with the compressed dictionaries with bucket size 4 or 16 respectively. ({"type":"utf8"}
is the default).
This mode is not set it as the default yet because any segments written like this will be unreadable by older versions of Druid, so care must be taken before migrating to this encoding. Additionally, this needs a lot more testing and measurement to ensure that it is genuinely better in most cases before making it the default, but it looks pretty promising.
Testing
Besides the direct tests on FrontCodedIndexed
and VByte
, I also wired a front-coded segment into both BaseFilterTest
and QueryTestHelper
, which provides a rather wide set of test coverage for a variety of scenarios. This process found a number of bugs in my initial commits, so I feel reasonably confident that things are correct at this point.
Future work
Before I started coding on this, in addition to the paper linked in #3922, https://arxiv.org/pdf/1101.5506.pdf, I also read through https://arxiv.org/pdf/1911.08372.pdf which was a newer paper by one of the authors on the first link, and also stumbled upon https://link.springer.com/content/pdf/10.1007/s00778-020-00620-x.pdf, all of which detail additional (much fancier) improvements which can be made to this strategy by further coding the string values (https://en.wikipedia.org/wiki/Re-Pair seems the primary focus in these papers). It would probably be worth investigating to determine at what cost additional size improvements can be gained.
Additionally, it seems to be ideal to be able to vary which encoding is used per column instead of setting it at the segment level (this seems true of other types of compression as well...). This could allow collection of statistic at indexing time to determine how likely this encoding is to be useful, such as minimum value cardinality thresholds or similar (akin to the 'auto' encoding available for long columns).
Key changed/added classes in this PR
-
VByte
-
FrontCodedIndexed
-
FrontCodedIndexedWriter
-
StringDimensionMergerV9
-
DictionaryEncodedColumnPartSerde
-
DictionaryEncodedStringColumnIndexSupplier
(internal classes split out into several new files) -
StringFrontCodedDictionaryEncodedColumn
-
StringFrontCodedDictionaryEncodedColumnSupplier
-
StringFrontCodedColumnIndexSupplier
-
NestedDataColumnMerger
-
NestedDataColumnSupplier
-
CompressedNestedDataComplexColumn
-
NestedFieldLiteralColumnIndexSupplier
-
NestedFieldLiteralDictionaryEncodedColumn
This PR has:
- [x] been self-reviewed.
- [ ] added documentation for new or modified features or behaviors.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
- [x] been tested in a test Druid cluster.
This pull request introduces 2 alerts when merging aadf6cda0e3b4405a44f493a9d59a0ccf886dde1 into b1640a72ee1090bd3c90240bf47a5bfd5b690f5a - view on LGTM.com
new alerts:
- 2 for Subtle call to inherited method
Some additional less scientific measurements, using a 10GB file of the nyc taxi dataset with all of the columns stored as strings:
grouping performance seems competitive:
select * does show a performance decrease as the earlier benchmarks suggested:
I still haven't had the chance to spend any time optimizing the code, but the size savings definitely make this feel worth considering for clusters where the typical workload does not include queries which hit a lot of columns like "wide" scans ("select *", etc) or group bys or things that hit a large number of columns.
there aren't any explicit conflicts, but this PR needs updated after #12315 to implement the new methods, i'll try to see if I can consolidate some of the code a bit better when I fix it up
This pull request introduces 1 alert when merging 28cf810ee57a32180ee471ab888a5b439de912ba into 331e6d707b003391d44ee583dbc328b2b6359425 - view on LGTM.com
new alerts:
- 1 for Inconsistent equals and hashCode
2.5m rows of data generated with https://github.com/lemire/StarSchemaBenchmark:

also (possibly coincidence) the smaller encoding seems to have a small positive effect on ingest task time:
ran taxi set again with mixed types, it gets pretty good improvements from the date/time strings:
also sometimes the improvement is not a lot, pretty small overlap in this one:
the slowdown on select * is also no longer quite as dramatic as in my first screenshots:
For reference, these were the first set of benchmarks when I opened the PR (because i have edited the PR description with the updated benchmarks):
Benchmark (indexType) (numElements) (numOperations) (width) Mode Cnt Score Error Units
FrontCodedIndexedBenchmark.get generic 10000 10000 16 avgt 5 78.819 ± 2.541 ns/op
FrontCodedIndexedBenchmark.get:encoded size generic 10000 10000 16 avgt 5 240010.000 bytes
FrontCodedIndexedBenchmark.get front-coded-4 10000 10000 16 avgt 5 220.887 ± 19.267 ns/op
FrontCodedIndexedBenchmark.get:encoded size front-coded-4 10000 10000 16 avgt 5 169894.000 bytes
FrontCodedIndexedBenchmark.get front-coded-16 10000 10000 16 avgt 5 629.621 ± 59.667 ns/op
FrontCodedIndexedBenchmark.get:encoded size front-coded-16 10000 10000 16 avgt 5 164428.000 bytes
FrontCodedIndexedBenchmark.indexOf generic 10000 10000 16 avgt 5 1228.012 ± 100.691 ns/op
FrontCodedIndexedBenchmark.indexOf:encoded size generic 10000 10000 16 avgt 5 240010.000 bytes
FrontCodedIndexedBenchmark.indexOf front-coded-4 10000 10000 16 avgt 5 2101.530 ± 209.465 ns/op
FrontCodedIndexedBenchmark.indexOf:encoded size front-coded-4 10000 10000 16 avgt 5 169908.000 bytes
FrontCodedIndexedBenchmark.indexOf front-coded-16 10000 10000 16 avgt 5 2364.248 ± 156.352 ns/op
FrontCodedIndexedBenchmark.indexOf:encoded size front-coded-16 10000 10000 16 avgt 5 164321.000 bytes
FrontCodedIndexedBenchmark.iterator generic 10000 10000 16 avgt 5 80.247 ± 1.639 ns/op
FrontCodedIndexedBenchmark.iterator:encoded size generic 10000 10000 16 avgt 5 240010.000 bytes
FrontCodedIndexedBenchmark.iterator front-coded-4 10000 10000 16 avgt 5 101.921 ± 6.225 ns/op
FrontCodedIndexedBenchmark.iterator:encoded size front-coded-4 10000 10000 16 avgt 5 169889.000 bytes
FrontCodedIndexedBenchmark.iterator front-coded-16 10000 10000 16 avgt 5 105.235 ± 2.333 ns/op
FrontCodedIndexedBenchmark.iterator:encoded size front-coded-16 10000 10000 16 avgt 5 164221.000 bytes
FrontCodedIndexedBenchmark.get generic 100000 10000 16 avgt 5 95.947 ± 3.842 ns/op
FrontCodedIndexedBenchmark.get:encoded size generic 100000 10000 16 avgt 5 2400010.000 bytes
FrontCodedIndexedBenchmark.get front-coded-4 100000 10000 16 avgt 5 239.630 ± 40.148 ns/op
FrontCodedIndexedBenchmark.get:encoded size front-coded-4 100000 10000 16 avgt 5 1636718.000 bytes
FrontCodedIndexedBenchmark.get front-coded-16 100000 10000 16 avgt 5 648.001 ± 28.965 ns/op
FrontCodedIndexedBenchmark.get:encoded size front-coded-16 100000 10000 16 avgt 5 1564861.000 bytes
FrontCodedIndexedBenchmark.indexOf generic 100000 10000 16 avgt 5 1612.575 ± 149.556 ns/op
FrontCodedIndexedBenchmark.indexOf:encoded size generic 100000 10000 16 avgt 5 2400010.000 bytes
FrontCodedIndexedBenchmark.indexOf front-coded-4 100000 10000 16 avgt 5 2934.297 ± 176.182 ns/op
FrontCodedIndexedBenchmark.indexOf:encoded size front-coded-4 100000 10000 16 avgt 5 1636652.000 bytes
FrontCodedIndexedBenchmark.indexOf front-coded-16 100000 10000 16 avgt 5 3235.271 ± 438.586 ns/op
FrontCodedIndexedBenchmark.indexOf:encoded size front-coded-16 100000 10000 16 avgt 5 1563940.000 bytes
FrontCodedIndexedBenchmark.iterator generic 100000 10000 16 avgt 5 787.253 ± 83.715 ns/op
FrontCodedIndexedBenchmark.iterator:encoded size generic 100000 10000 16 avgt 5 2400010.000 bytes
FrontCodedIndexedBenchmark.iterator front-coded-4 100000 10000 16 avgt 5 1048.532 ± 29.200 ns/op
FrontCodedIndexedBenchmark.iterator:encoded size front-coded-4 100000 10000 16 avgt 5 1636546.000 bytes
FrontCodedIndexedBenchmark.iterator front-coded-16 100000 10000 16 avgt 5 1054.068 ± 66.667 ns/op
FrontCodedIndexedBenchmark.iterator:encoded size front-coded-16 100000 10000 16 avgt 5 1565019.000 bytes
Benchmark (query) (rowsPerSegment) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 4 5000000 none false avgt 5 152.079 ± 7.234 ms/op
SqlBenchmark.querySql 4 5000000 none force avgt 5 76.931 ± 3.635 ms/op
SqlBenchmark.querySql 4 5000000 fc4 false avgt 5 151.430 ± 11.542 ms/op
SqlBenchmark.querySql 4 5000000 fc4 force avgt 5 77.045 ± 3.972 ms/op
SqlBenchmark.querySql 4 5000000 fc16 false avgt 5 160.475 ± 16.692 ms/op
SqlBenchmark.querySql 4 5000000 fc16 force avgt 5 75.423 ± 4.572 ms/op
SqlBenchmark.querySql 5 5000000 none false avgt 5 49.356 ± 2.468 ms/op
SqlBenchmark.querySql 5 5000000 none force avgt 5 49.141 ± 2.853 ms/op
SqlBenchmark.querySql 5 5000000 fc4 false avgt 5 49.169 ± 2.025 ms/op
SqlBenchmark.querySql 5 5000000 fc4 force avgt 5 49.232 ± 2.801 ms/op
SqlBenchmark.querySql 5 5000000 fc16 false avgt 5 49.438 ± 2.585 ms/op
SqlBenchmark.querySql 5 5000000 fc16 force avgt 5 49.451 ± 3.924 ms/op
SqlBenchmark.querySql 6 5000000 none false avgt 5 199.922 ± 7.085 ms/op
SqlBenchmark.querySql 6 5000000 none force avgt 5 105.862 ± 7.491 ms/op
SqlBenchmark.querySql 6 5000000 fc4 false avgt 5 209.565 ± 21.137 ms/op
SqlBenchmark.querySql 6 5000000 fc4 force avgt 5 106.155 ± 7.080 ms/op
SqlBenchmark.querySql 6 5000000 fc16 false avgt 5 199.692 ± 11.608 ms/op
SqlBenchmark.querySql 6 5000000 fc16 force avgt 5 107.823 ± 9.708 ms/op
SqlBenchmark.querySql 7 5000000 none false avgt 5 154.474 ± 7.260 ms/op
SqlBenchmark.querySql 7 5000000 none force avgt 5 84.477 ± 4.757 ms/op
SqlBenchmark.querySql 7 5000000 fc4 false avgt 5 150.924 ± 8.694 ms/op
SqlBenchmark.querySql 7 5000000 fc4 force avgt 5 82.500 ± 4.569 ms/op
SqlBenchmark.querySql 7 5000000 fc16 false avgt 5 151.421 ± 7.497 ms/op
SqlBenchmark.querySql 7 5000000 fc16 force avgt 5 86.868 ± 3.030 ms/op
SqlBenchmark.querySql 10 5000000 none false avgt 5 780.820 ± 68.020 ms/op
SqlBenchmark.querySql 10 5000000 none force avgt 5 432.962 ± 62.454 ms/op
SqlBenchmark.querySql 10 5000000 fc4 false avgt 5 798.933 ± 71.454 ms/op
SqlBenchmark.querySql 10 5000000 fc4 force avgt 5 458.484 ± 37.257 ms/op
SqlBenchmark.querySql 10 5000000 fc16 false avgt 5 872.225 ± 49.937 ms/op
SqlBenchmark.querySql 10 5000000 fc16 force avgt 5 522.340 ± 35.679 ms/op
SqlBenchmark.querySql 12 5000000 none false avgt 5 69.159 ± 4.653 ms/op
SqlBenchmark.querySql 12 5000000 none force avgt 5 33.735 ± 2.004 ms/op
SqlBenchmark.querySql 12 5000000 fc4 false avgt 5 69.006 ± 4.162 ms/op
SqlBenchmark.querySql 12 5000000 fc4 force avgt 5 33.413 ± 1.792 ms/op
SqlBenchmark.querySql 12 5000000 fc16 false avgt 5 68.738 ± 3.524 ms/op
SqlBenchmark.querySql 12 5000000 fc16 force avgt 5 34.627 ± 2.654 ms/op
SqlBenchmark.querySql 19 5000000 none false avgt 5 700.311 ± 100.489 ms/op
SqlBenchmark.querySql 19 5000000 none force avgt 5 601.641 ± 39.403 ms/op
SqlBenchmark.querySql 19 5000000 fc4 false avgt 5 1056.911 ± 91.692 ms/op
SqlBenchmark.querySql 19 5000000 fc4 force avgt 5 963.832 ± 55.223 ms/op
SqlBenchmark.querySql 19 5000000 fc16 false avgt 5 2159.028 ± 145.168 ms/op
SqlBenchmark.querySql 19 5000000 fc16 force avgt 5 2107.042 ± 123.906 ms/op
SqlBenchmark.querySql 21 5000000 none false avgt 5 773.091 ± 46.505 ms/op
SqlBenchmark.querySql 21 5000000 none force avgt 5 459.661 ± 36.800 ms/op
SqlBenchmark.querySql 21 5000000 fc4 false avgt 5 837.426 ± 41.612 ms/op
SqlBenchmark.querySql 21 5000000 fc4 force avgt 5 504.546 ± 18.115 ms/op
SqlBenchmark.querySql 21 5000000 fc16 false avgt 5 919.155 ± 106.868 ms/op
SqlBenchmark.querySql 21 5000000 fc16 force avgt 5 602.116 ± 7.865 ms/op
SqlBenchmark.querySql 22 5000000 none false avgt 5 44.268 ± 2.707 ms/op
SqlBenchmark.querySql 22 5000000 none force avgt 5 43.919 ± 2.962 ms/op
SqlBenchmark.querySql 22 5000000 fc4 false avgt 5 44.458 ± 3.590 ms/op
SqlBenchmark.querySql 22 5000000 fc4 force avgt 5 43.738 ± 2.317 ms/op
SqlBenchmark.querySql 22 5000000 fc16 false avgt 5 44.304 ± 2.876 ms/op
SqlBenchmark.querySql 22 5000000 fc16 force avgt 5 43.906 ± 2.283 ms/op
SqlBenchmark.querySql 23 5000000 none false avgt 5 18260.991 ± 1041.530 ms/op
SqlBenchmark.querySql 23 5000000 none force avgt 5 18400.261 ± 1689.495 ms/op
SqlBenchmark.querySql 23 5000000 fc4 false avgt 5 30704.667 ± 1624.680 ms/op
SqlBenchmark.querySql 23 5000000 fc4 force avgt 5 30603.476 ± 621.976 ms/op
SqlBenchmark.querySql 23 5000000 fc16 false avgt 5 48777.535 ± 3223.712 ms/op
SqlBenchmark.querySql 23 5000000 fc16 force avgt 5 49966.818 ± 3739.684 ms/op
We need to add some query benchmarks that include more realistic filter lists. Likely, we should have one query with an IN filter with 100 entries, another with 1000 entries and a third with 10,000 entries. If we want to vary it, we should run them each with different "hit ratios", but for now, we can perhaps give them a 20% overlap with actual values that exist in the dictionaries. If we do that, I suspect that we will start exercising some of the code paths that I'm curious about.
This is currently sort of provided by the FrontCodedIndexedBenchmark
, which i think is the most appropriate place to measure that directly, though I only did 10k entries and didn't spend much time with adding non-existent values since I wasn't targeting specifically the in filter. I don't think the general SQL benchmark is the most appropriate place to go that in depth on specific filters, and in my experience, most of the IN
filters with 1k+ entries I have seen are usually machine generated native queries rather than SQL, since the overhead of parsing that much text into SQL and then planning also starts to stick out.
On the IN filter benchmarks, I've definitely seen client-generated queries with large IN filters. Looker generates them sometimes, other tools also generate them and there are definitely applications that generate them. It is programmatically generated, yes, but it also comes from external sources. I forget if we've fixed this or not yet, but this is common enough that we have (or had?) a known issue around SQL parsing of large IN filters where it first puts them all into ORs before converting back to IN and our planning code for some reason likes to do O(n^2)
passes over ORs. Having a benchmark that also covers that case would be nice.
When I look at the massive UNION ALL query in query 19, there is an actual difference between the timings for that query between GenericIndexed and the FrontCodedIndex. I just wonder if that's because of the IN clause at the very end of that query...
Just looked again at the FrontCodedIndexBenchmark
and I see the GenericIndexed
in there too. I can't help but wonder if the differences there are actually from the fact that FrontCoded
is dealing with UTF8 bytes instead of String
. I.e. all of the glut of dealing with String is equivalent to the glut of the extra objects and they are just equaling each other out. If that's the case, then there's even more benefit to be had from the FrontCodedIndexBenchmark
. The other thing that's much more difficult to measure, but that we have seen is that garbage invokes GC and GC stops all queries from making progress. This can limit the overall level of concurrency of the distributed system and can be difficult to measure in a benchmark.
Just looked again at the FrontCodedIndexBenchmark and I see the GenericIndexed in there too. I can't help but wonder if the differences there are actually from the fact that FrontCoded is dealing with UTF8 bytes instead of String. I.e. all of the glut of dealing with String is equivalent to the glut of the extra objects and they are just equaling each other out. If that's the case, then there's even more benefit to be had from the FrontCodedIndexBenchmark.
Yeah, I don't think FrontCodedIndexed
has that much more garbage than regular string using GenericIndexed
, a bit more - but not a ton. I would agree that there is a lot of room for improvement (probably with GenericIndexed
string columns too). In FrontCodedIndexBenchmark
, the FrontCodedIndexed
is converting values to string and from bytebuffer as well, so it should be apples to apples I think, mostly. My earlier implementation of FrontCodedIndexed
was Indexed<String>
, since this PR predated #12517, you can see the older numbers here https://github.com/apache/druid/pull/12277#issuecomment-1255538371
i spent the last couple days optimizing this heavily and it seems to have paid off, as I've managed to greatly minimize almost all of the performance difference between this and standard GenericIndexed
based string columns, it looks acceptably higher at this point for the size savings. I've updated the benchmarks in the PR description to reflect this.
I ran a few of the SqlNestedDataBenchmarks
that were doing string stuff just to spot check and things look good there too:
SELECT string1, SUM(long1) FROM foo GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.nesteder.string1'), SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo GROUP BY 1 ORDER BY 2
Benchmark (query) (rowsPerSegment) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 6 5000000 none false avgt 5 229.141 ± 4.357 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 none force avgt 5 158.286 ± 1.982 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 front-coded-4 false avgt 5 226.019 ± 2.990 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 front-coded-4 force avgt 5 154.666 ± 0.682 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 front-coded-16 false avgt 5 218.805 ± 3.507 ms/op
SqlNestedDataBenchmark.querySql 6 5000000 front-coded-16 force avgt 5 159.220 ± 9.396 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 none false avgt 5 379.591 ± 6.253 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 none force avgt 5 196.781 ± 3.562 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 front-coded-4 false avgt 5 369.041 ± 4.383 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 front-coded-4 force avgt 5 197.589 ± 3.049 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 front-coded-16 false avgt 5 379.980 ± 2.840 ms/op
SqlNestedDataBenchmark.querySql 7 5000000 front-coded-16 force avgt 5 198.248 ± 4.503 ms/op
SELECT SUM(long1) FROM foo WHERE string1 = '10000' OR string1 = '1000'
SELECT SUM(JSON_VALUE(nested, '$.long1' RETURNING BIGINT)) FROM foo WHERE JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested, '$.nesteder.string1') = '1000'
Benchmark (query) (rowsPerSegment) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 10 5000000 none false avgt 5 11.487 ± 0.236 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 none force avgt 5 11.472 ± 0.201 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 front-coded-4 false avgt 5 11.509 ± 0.198 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 front-coded-4 force avgt 5 11.510 ± 0.297 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 front-coded-16 false avgt 5 11.480 ± 0.288 ms/op
SqlNestedDataBenchmark.querySql 10 5000000 front-coded-16 force avgt 5 11.458 ± 0.270 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 none false avgt 5 11.650 ± 0.274 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 none force avgt 5 11.674 ± 0.254 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 front-coded-4 false avgt 5 11.681 ± 0.312 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 front-coded-4 force avgt 5 11.672 ± 0.340 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 front-coded-16 false avgt 5 11.792 ± 0.383 ms/op
SqlNestedDataBenchmark.querySql 11 5000000 front-coded-16 force avgt 5 11.809 ± 0.422 ms/op
SELECT long1, SUM(double3) FROM foo WHERE string1 = '10000' OR string1 = '1000' GROUP BY 1 ORDER BY 2
SELECT JSON_VALUE(nested, '$.long1' RETURNING BIGINT), SUM(JSON_VALUE(nested, '$.nesteder.double3' RETURNING DOUBLE)) FROM foo WHERE JSON_VALUE(nested, '$.nesteder.string1') = '10000' OR JSON_VALUE(nested, '$.nesteder.string1') = '1000' GROUP BY 1 ORDER BY 2
Benchmark (query) (rowsPerSegment) (stringEncoding) (vectorize) Mode Cnt Score Error Units
SqlNestedDataBenchmark.querySql 16 5000000 none false avgt 5 126.009 ± 1.829 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 none force avgt 5 125.930 ± 2.802 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 front-coded-4 false avgt 5 125.991 ± 1.981 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 front-coded-4 force avgt 5 126.098 ± 4.202 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 front-coded-16 false avgt 5 125.795 ± 6.560 ms/op
SqlNestedDataBenchmark.querySql 16 5000000 front-coded-16 force avgt 5 126.172 ± 3.807 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 none false avgt 5 126.375 ± 2.382 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 none force avgt 5 125.585 ± 0.396 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 front-coded-4 false avgt 5 125.678 ± 2.668 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 front-coded-4 force avgt 5 125.355 ± 2.104 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 front-coded-16 false avgt 5 127.011 ± 5.057 ms/op
SqlNestedDataBenchmark.querySql 17 5000000 front-coded-16 force avgt 5 126.835 ± 3.172 ms/op
Segment sizes for benchmark are ~3.4G instead of 3.6G, but similar issue with the SqlBenchmark
using the data generator in that the data is all stringified numbers so limited in ability to benefit from this encoding.
Got it. Thanks for explaining.
On Sun, 23 Oct 2022 at 7:13 AM, Clint Wylie @.***> wrote:
@.**** commented on this pull request.
In processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java https://github.com/apache/druid/pull/12277#discussion_r1002613660:
dictionary = GenericIndexed.read(stringDictionaryBuffer, GenericIndexed.BYTE_BUFFER_STRATEGY, mapper);
frontCodedDictionary = null;
Ah, so I decided to just make it a Supplier for users of FrontCodedIndexed so that I don't have to do something like what nearly all of the actual users of GenericIndexed are doing, which is call to singleThreaded() on the top level column GenericIndexed dictionary to get an optimized version that isn't thread-safe for use in a single thread so that it can create less garbage. By using a supplier I can avoid all this since all threads just get their own copy (which is basically what all callers are doing anyway).
— Reply to this email directly, view it on GitHub https://github.com/apache/druid/pull/12277#discussion_r1002613660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIWUIFVTY7NILM2AKCT7TWESJ53ANCNFSM5PEBYT4A . You are receiving this because you commented.Message ID: @.***>