HaloDB icon indicating copy to clipboard operation
HaloDB copied to clipboard

feature/support large keys (#42)

Open scottcarey opened this issue 5 years ago • 3 comments

This change allows for keys to be up to 2047 bytes long for both non-pooled and memory-pool implementations. Performance is moderately improved in the process, and the RAM required for the memory pool is reduced by 2+ bytes per key.

Each individual commit should build and test successfully, this was developed incrementally and tested at each step. It may be easier to follow by first perusing the commits individually, then looking at the combined diff after that.

Read the full individual commit messages for more detailed information. A few highlights:

In the file format, 'version' is now a number between 0 and 31. 'key size' is a number between 0 and 2047. For the in-memory data structures, the 5 bytes that used to hold key size and value size are now split with 11 bits for the key (size between 0 and 2047) and 29 bits for the value size (max value size 512MB). MemoryPoolSegments are now addressed by their slot rather than their offset, which allows one more byte per entry to be removed, leading to 4-byte MemoryPoolAddress values, saving a byte per slot in the master table and a byte per slot in the pool. The non-pooled implementation supports large keys in a simple way, by allocating a larger entry. The memory pool implementation uses multiple slots (chained together) to hold larger key data. If keys are equal to or smaller than the fixedKeyLength, these function as before. If keys are larger, slots are used to hold the 'overflowing' portion of the key. Lastly, the memory pool implementation was refactored a bit to encapsulate the chunk offsets inside of a new MemoryPoolChunk.Slot class, which makes the code a lot easier to read and removes a lot of scope for error as the raw offsets are not passed around between chunks and addresses loosely.

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

scottcarey avatar Dec 05 '19 22:12 scottcarey

The two stack traces in the failed build make no sense at all. A byte array's length field is being passed in to another byte array initializer, which gives a negative array size exception. A byte array can't have a negative size... I ran the tests locally a dozen times without issue. I'll kick off travis again.

scottcarey avatar Dec 05 '19 23:12 scottcarey

I take that back, I can see that these tests have a ~ 4% or so chance of failing due to the test set-up.

scottcarey avatar Dec 05 '19 23:12 scottcarey

If no dependency between the commits, I would suggest split to multiple PRs with good test coverage and have the changes in step by step.

wangtao724 avatar Dec 26 '19 22:12 wangtao724