skywalking icon indicating copy to clipboard operation
skywalking copied to clipboard

[BanyanDB] Optimizing Column Encoding

Open hanahmily opened this issue 1 year ago • 8 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

There are several optimizations we should apply to the column encoding, the column refers to tags and fields.

  • Move constant values within a block to the metadata.
  • Encode each column based on its data type (string, int64, or float64).
  • Use low cardinality encoding when the column has a limited set of value options within a block.
  • Consider encoding array types using a columnar strategy. If the array size is consistent within a block, transform the arrays into a matrix and group and encode the values within the same column.

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

  • [ ] Yes I am willing to submit a pull request on my own!

Code of Conduct

hanahmily avatar Jul 15 '24 23:07 hanahmily

Please assign to me.

sollhui avatar Jul 16 '24 05:07 sollhui

Are you confident to take two in the same time? How much do you understand BanyanDB?

wu-sheng avatar Jul 16 '24 05:07 wu-sheng

Are you confident to take two in the same time? How much do you understand BanyanDB?

I am familiar with BanyanDB code and have contributed 10 PR, but I don't think it's an easy task. Let me discuss it with @hanahmily first

sollhui avatar Jul 16 '24 05:07 sollhui

This is on the next iteration only, unless you will finish it in time for 0.7. So, don't hurry and take your time.

@hanahmily Please note, as we are changing docs to user oriented, please make sure the file structure docs covers encoding docs with proper docs and clear examples.

wu-sheng avatar Jul 16 '24 05:07 wu-sheng

@sollhui Let's discuss the details first.

@wu-sheng Sure, we will update the relevant documents according to the new structures. This change will not break the file system; therefore, we do not have to increase the file system version.

hanahmily avatar Jul 16 '24 07:07 hanahmily

Let's discuss details when you have the design. I am not sure how to change the encoding doesn't affect storage structure. Changing doesn't mean breaking, such as, you have a new encoding type, which will also affect new structure in the file, but no breaking.

wu-sheng avatar Jul 16 '24 13:07 wu-sheng

I'd like to take on low cardinality encoding. I think we can use dictionary encoding for the values, and employ RLE and Bit-Packing to encode the dictionary indicies. @hanahmily

ButterBright avatar May 29 '25 03:05 ButterBright

I'd like to take on low cardinality encoding. I think we can use dictionary encoding for the values, and employ RLE and Bit-Packing to encode the dictionary indicies. @hanahmily

Great, I will create a sub task to you.

hanahmily avatar May 29 '25 03:05 hanahmily