kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Refactor the management of rocksdb ColumnFamily

Open mapleFU opened this issue 1 year ago • 7 comments
trafficstars

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Motivation

Currently, kvrocks has multiple Column Families, and some configs are written ad-hoc. So, some issues below would be tricky:

  1. https://github.com/apache/kvrocks/issues/2193
  2. https://github.com/apache/kvrocks/issues/2263

Now I think we need a class for owning the kvrocks cf configs.

Solution

A unified CF manager

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

mapleFU avatar Apr 28 '24 05:04 mapleFU

For setting the write_buff_size for each column family. There are a simple way that i can think of is to extend config set rocksdb.write_buffer_size num command.

For example, the command of config set rocksdb.write_buffer_size [cf_name] num is used to set the buff_size of the specify cf.

More example:

config set rocksdb.write_buffer_size ALL num  
# equivalent to `config set rocksdb.write_buffer_size num`
# settings the buff_size for all of column families

config set rocksdb.write_buffer_size MAJOR num
# for major colunm families include metadata and default

config set rocksdb.write_buffer_size MINOR num
# for minor colunm families except for metadata and default

config set rocksdb.write_buffer_size [metadate | default | zset | pubsub | ... ] num
# for specify column family

jjz921024 avatar Apr 29 '24 14:04 jjz921024

I'm trying to add a code-level refactor on this but I'm feel sick these days, maybe I'll update in coming few days

@jjz921024 these style of config here ( https://github.com/apache/kvrocks/issues/2279#issuecomment-2082885791 ) can separate as configs and would be nice for user

mapleFU avatar Apr 29 '24 15:04 mapleFU

For setting the write_buff_size for each column family. There are a simple way that i can think of is to extend config set rocksdb.write_buffer_size num command.

For example, the command of config set rocksdb.write_buffer_size [cf_name] num is used to set the buff_size of the specify cf.

More example:

config set rocksdb.write_buffer_size ALL num  
# equivalent to `config set rocksdb.write_buffer_size num`
# settings the buff_size for all of column families

config set rocksdb.write_buffer_size MAJOR num
# for major colunm families include metadata and default

config set rocksdb.write_buffer_size MINOR num
# for minor colunm families except for metadata and default

config set rocksdb.write_buffer_size [metadate | default | zset | pubsub | ... ] num
# for specify column family

That's a good idea. However, here are two things that need to be considered:

  • how to be compatible with old config files
  • previously I try to make all config keys unique, since it will make the config part more intuitive and easy to process in the code level, especially for GET and REWRITE (I spent lots of time to rewrite code of the config part). But this design will break it.

PragmaTwice avatar Apr 29 '24 16:04 PragmaTwice

About the naming of column family:

AS IS:

0 default
1 metadata
2 zset score
3 pubsub
4 propagate
5 stream
6 search

TO BE:

0 primary subkeys
1 key metadata
2 secondary subkeys
3 pubsub
4 propagate
5 TBD (I don't know why stream should have a seperate cf)
6 search

PragmaTwice avatar May 06 '24 05:05 PragmaTwice

Also, we should have a better abstraction of the management of column families.

Currently it's too hard to maintain, and these adhoc logic about column families is written randomly in everywhere.

PragmaTwice avatar May 06 '24 05:05 PragmaTwice

Since it seems very important, I'll add it into our roadmap. cc @mapleFU @git-hulk

PragmaTwice avatar May 06 '24 05:05 PragmaTwice

Also, we should have a better abstraction of the management of column families.

Currently it's too hard to maintain, and these adhoc logic about column families is written randomly in everywhere.

Agreed, adding the new column family also will break the forward compatibility, we also need to address them while releasing.

git-hulk avatar May 06 '24 12:05 git-hulk