[BanyanDB] Optimizing Column Encoding
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Description
There are several optimizations we should apply to the column encoding, the column refers to tags and fields.
- Move constant values within a block to the metadata.
- Encode each column based on its data type (string, int64, or float64).
- Use low cardinality encoding when the column has a limited set of value options within a block.
- Consider encoding array types using a columnar strategy. If the array size is consistent within a block, transform the arrays into a matrix and group and encode the values within the same column.
Use case
No response
Related issues
No response
Are you willing to submit a pull request to implement this on your own?
- [ ] Yes I am willing to submit a pull request on my own!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Please assign to me.
Are you confident to take two in the same time? How much do you understand BanyanDB?
Are you confident to take two in the same time? How much do you understand BanyanDB?
I am familiar with BanyanDB code and have contributed 10 PR, but I don't think it's an easy task. Let me discuss it with @hanahmily first
This is on the next iteration only, unless you will finish it in time for 0.7. So, don't hurry and take your time.
@hanahmily Please note, as we are changing docs to user oriented, please make sure the file structure docs covers encoding docs with proper docs and clear examples.
@sollhui Let's discuss the details first.
@wu-sheng Sure, we will update the relevant documents according to the new structures. This change will not break the file system; therefore, we do not have to increase the file system version.
Let's discuss details when you have the design. I am not sure how to change the encoding doesn't affect storage structure. Changing doesn't mean breaking, such as, you have a new encoding type, which will also affect new structure in the file, but no breaking.
I'd like to take on low cardinality encoding. I think we can use dictionary encoding for the values, and employ RLE and Bit-Packing to encode the dictionary indicies. @hanahmily
I'd like to take on low cardinality encoding. I think we can use dictionary encoding for the values, and employ RLE and Bit-Packing to encode the dictionary indicies. @hanahmily
Great, I will create a sub task to you.