Add Version.DB; encode map entries using correct ByteComparable encoding
Introduce SAI on disk format Version.DB.
Goal
Support order by for a give map key, and support faster range queries on map entries.
Problem
We currently encode map entries using the ByteSource.of method, and that assumes the bytes are already capable of being sorted correctly.
Solution
We must encode map entries using a byte comparable encoding that retains the source type's order.
Modifications
- Encode map entries for memory and on disk trie indexes using the appropriate
CompositeTypeand do not attempt to unescape the result. - Ensure that compaction applies the correct encoding via the
RAMStringSegmentBuilder. - Expand the
OnDiskFormatinterface with new methods to support encoding trie entries using the relevant format. This is necessary for writing earlier index formats and will ensure correct compatibility. It also makes the different indexes easier to read because there are fewer conditionals for versions. - Make the
RangeTermTreeclass version aware so that we can correctly identify relevant indexes to search. This is only necessary because of the special logic we have for range queries on maps. Perhaps we should consider only doing this when the index is an ENTRIES index on a map. - By using the correct map encoding in the trie, we can remove the unnecessary value check for satisfaction on the expression, which also means we can skip collecting the bytes in v4.
- Expand testing framework to cover new and old index formats.
Interesting implementation details
- The min and max terms in the
RangeTermTreeare not encoded (other than the call toTypeUtil.encode.... which does technically do some encoding, but more realistically truncates.)
Note: this is a prerequisite for index based ordering on maps.
Note 2: this change is covered by existing tests.
Looks like I broke a bunch of tests :). So the main ones that require this change are MapEntriesIndexTest, and those are passing. Obviously, we'll need to get the other tests passing too.
This is ready for review @jbellis @jkni @pkolaczk
The only open question is: how do we implement tests for the different versions. I manually confirmed that things work for CA, and CI shows that DA is working.
Apache is going to use version da eventually. There are ways to tell Apache's files apart, but to avoid confusion can we make this db instead?
We can certainly use something different, though I think that might mess with this kind of conditional check !Version.LATEST.onOrAfter(Version.DA). I'll need to look closer.
@pkolaczk - I felt the same way at first, but I think it already was version dependent. Because we dump the in memory true to disk, isn't it already version dependent? Further, if we want to run with CA for a while before enabling DB, we need to encode maps the old way.
I feel that we don't still have a "best practice" yet, I would say that by default we should use the old version.
I agree that it makes sense to set the default back to the old version. When I first wrote the code, we didn't have as many parameterized tests, so this was the only way to get coverage. However, we might be good enough now to get the right level of coverage even if we don't have the default set to db (obviously we should have coverage of all versions regardless of the default).
I just changed the default version. Let's see what the test coverage changes to. The coverage before this most recent commit was 93.3%.
Code coverage looks great still.
Quality Gate passed
Issues
7 New issues
0 Accepted issues
Measures
0 Security Hotspots
94.0% Coverage on New Code
0.0% Duplication on New Code