RocksDict icon indicating copy to clipboard operation
RocksDict copied to clipboard

How to get WideColumns data in " Raw Mode" from rocksdb

Open fmvin opened this issue 9 months ago • 6 comments

The db is created in C++ app and contains a lot of WideColumns data. Is it possible to access these data using RocksDict? In the code example below value (v) always is 0 but keys (k) is shown as expected.

cf_lst=Rdict.list_cf(GLB_PATH)
opts=Options(raw_mode=True)
db = Rdict(path=GLB_PATH,  options=opts, column_families={cf_lst[1]: opts})
db_cf1 = db.get_column_family(cf_lst[1])
#
for k, v in db_cf1.items():
    print(f'k={k}, v size={len(v)}')

db.close()

OS: Windows 2019 Server compiler: msvc 19.39.33523 rocksdb: v9.0 RocksDict: v0.3.23

fmvin avatar May 07 '24 21:05 fmvin

Looks like WideColumns is not yet supported here, and we will need to add APIs like GetEntity and PutEntity to support it.

Congyuwang avatar May 09 '24 12:05 Congyuwang

Quoting rocksdb wide column doc:

The classic Get, MultiGet, GetMergeOperands, and Iterator::value APIs return the value of the default column when they encounter an entity, while the new APIs GetEntity, MultiGetEntity, and Iterator::columns return any plain key-value in the form of an entity with a single column, namely the anonymous default column.

Iterator returns the default value, which is empty. Needs columns() api, which is not yet supported by rocksdict yet.

Congyuwang avatar May 09 '24 12:05 Congyuwang

Just curious, what do you use WideColumns for?

For the moment, if the object is not that large, I would suggest to use some custom deserialization for the entities. The APIs related to WideColumns have not yet been explosed to C interface yet by rocksdb. So, I would need some time and wait for rocksdb to design a proper C interface for WideColumns related APIs.

Congyuwang avatar May 10 '24 04:05 Congyuwang

Related: https://github.com/facebook/rocksdb/issues/12635

Congyuwang avatar May 10 '24 04:05 Congyuwang

Some kind of in-memory tables with random culumn's number in each row which are being frequently updated. I have found that using WideColumns fits well with my app architecture and allowed me to easily migrate from kx kdb.

For iterating with python I'm going to create a special copies of several tables using MessagePack serializer for the entities in the way you proposed. But it is some kind of overhead.

fmvin avatar May 10 '24 13:05 fmvin

I've already drafted an up-stream PR: https://github.com/facebook/rocksdb/pull/12653

Congyuwang avatar May 15 '24 01:05 Congyuwang

Check wide_columns_raw examples with pip install rocksdict==0.3.24b1 (pypi link).

Tell me if it works 🙂.

Congyuwang avatar May 15 '24 05:05 Congyuwang

No success. From real db I cannot access wide columns from column family (CF). Please provide a simple example how to use the get_entity method with CF. The db itself works fine, checked it with ldb tool.

fmvin avatar May 16 '24 02:05 fmvin

I'm about to release a beta.2, which will make opening DB created by other languages (c++, java, rust) much straightforward.

Congyuwang avatar May 16 '24 03:05 Congyuwang

It seems I didn’t clearly explain the problem. In other words, I can’t figure out how to pass CF to the get_entity method.

fmvin avatar May 16 '24 03:05 fmvin

Ok. Try pip install rocksdict==0.3.24b2, and

from rocksdict import Rdict

# This will automatically load latest options and column families.
# Note also that this is automatically RAW MODE,
# as it knows that the db is not created by RocksDict.
db = Rdict("db_path")

# list column families
cfs = Rdict.list_cf("db_path")
print(cfs)

# use one of the column families
cf1 = db.get_column_family(cfs[1])

# iterate through all wide columns in cf1
for k, v in cf1.entities():
    print(f"{k} -> {v}")

# or query specific entity in cf1
print(cf1.get_entity(b"some_key"))

Tell me if it works.

Congyuwang avatar May 16 '24 04:05 Congyuwang

The logic of rocksdict is that, we do not pass cf argument to any of get, put, iter, get_entity, and etc.. Instead, use some_cf = db.get_column_family("some_cf_name") which returns an object with exact identical methods as Rdict including get, put, delete, get_entity, and etc. All of these operations returns only data from some_cf

Congyuwang avatar May 16 '24 04:05 Congyuwang