erocksdb icon indicating copy to clipboard operation
erocksdb copied to clipboard

all: Impl RocksDB specific features

Open mocchira opened this issue 10 years ago • 13 comments

  • open_with_cf
  • list_column_families
  • create_column_family
  • drop_column_family
  • put/5 (with cf)
  • delete/4 (with cf)
  • get/4 (with cf)
  • iterator_with_cf
  • count/2 (with cf)
  • status/3 (with cf)

mocchira avatar Jan 23 '15 05:01 mocchira

what is the status of these features?

benoitc avatar Jan 13 '16 00:01 benoitc

We do not implement them yet because LeoFS' libs do not use their functions.

Actually, we have been working on another project, Astra - a distributed database (not OSS). If we need their functions in Astra, we'll implement them. In next week, we're planning to discuss whether we implement them or not, then I'll get in touch with you.

yosukehara avatar Jan 13 '16 03:01 yosukehara

Hrm OK. On a side node it's confusing to have the API already in the source even if it raises some error. Why not simply removing it for now so we don't have to read extra code until it's done ?

benoitc avatar Jan 13 '16 21:01 benoitc

@yosukehara I would like to implement that change but not ure about the current API. You will notice that the rocksdb API to use column has slightly changed since:

https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L150

The thing I am not sure is if I need to pass to the get and other functions a ColumnFamily handle or only the name of the column and keep the handle in the DbObject instance instead using a map.

Ex:

CfOptions = [],
Cf = erocksdb:column_family(mycf, CfOptions),
{ok, Db} = erocksdb:open("path", Options, [Cf]),
{ok, Val} = erocksdb:get(Db, mycf, "somekey", []);
..

The column name will be an atom, The atom will be used to retrieve the column family handle from the database.

Another way would be retrieving ColumnHandles in a map like in the rust binding:

CfOptions = [],
Cf = erocksdb:column_family("mycf," CfOptions),
{ok, Db, CfHandles} = erocksdb:open("path", Options, [Cf]),
{ok, Val} = erocksdb:get(Db, maps:get("mycf", CfHandles), "somekey", []);
..

But it will imply to remove the support for erlang < 18. Also this API is more verbose.

Thoughts?

benoitc avatar Jan 21 '16 07:01 benoitc

cc @mocchira

benoitc avatar Jan 21 '16 09:01 benoitc

@benoitc Thank you for your comments. I asked @mocchira to communicate with you to improve erocksdb.

yosukehara avatar Jan 22 '16 01:01 yosukehara

@benoitc Thank you for coming here! Let's discuss about what the erlang I/F should be.

At first, Let me clarify some points.

You will notice that the rocksdb API to use column has slightly changed since:

I've checked the diff include/rocksdb/db.h between 3.11 and 4.1,

git diff v3.11 rocksdb-4.1 include/rocksdb/db.h

but seems there is no changes affecting our predefined ( return {error, not_implemented} ) column family related I/F. Could you elaborate that?

The thing I am not sure is if I need to pass to the get and other functions a ColumnFamily handle or ?only the name of the column and keep the handle in the DbObject instance instead using a map.

IMHO as I said at the another PR, We'd like to basically keep I/F consistent with RocksDB's one. ( this rule may give us opportunities to generate erlang I/F automatically according to the original I/F when new I/F will be added ) So Passing a ColumnFamily handle is the first choice for me.

but I've thought the current predefined I/F could be improved more such as naming suffixed by _with_cf to avoid fun signature conflicts are awkward to me.. So any suggestions are welcome!

As a side note,

The column name will be an atom

Since there is no limit about the number of column families on rocksdb, atom should be avoided.

But it will imply to remove the support for erlang < 18. Also this API is more verbose.

AFAIK, Since the minimum version of erlang our users have used is R16B03-1, map also should be avoided.

mocchira avatar Jan 22 '16 06:01 mocchira

@mocchira I wasn't sure about the open_with_cf, I don't see it DB.h . Also the main question is how do you recover the ColumnFamily handles once you open. As a list in the same order? As a proplists?

I will have some cycle this we to do it so let me know :)

benoitc avatar Jan 22 '16 09:01 benoitc

@benoitc Sorry for the late reply.

I wasn't sure about the open_with_cf, I don't see it DB.h

https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L150

  static Status Open(const DBOptions& db_options, const std::string& name,
                     const std::vector<ColumnFamilyDescriptor>& column_families,
                     std::vector<ColumnFamilyHandle*>* handles, DB** dbptr);

is corresponding with https://github.com/leo-project/erocksdb/blob/develop/src/erocksdb.erl#L195

-spec(open_with_cf(Name, DBOpts, CFDescriptors) ->
             {ok, db_handle(), list(cf_handle())} | {error, any()}
               when Name::file:filename_all(),
                    DBOpts :: db_options(),
                    CFDescriptors :: list(#cf_descriptor{})).

Also the main question is how do you recover the ColumnFamily handles once you open. As a list in the same order? As a proplists?

good point. I think the optimal choice how to recover ( associate | bind ) ColumnFamily handles depends on the application ( library users ) context. For example,

  • For a batch program which simply open a db with dozen of column families and do something and finally close, a list in the same order O(N) may be sufficient.
  • For a batch program which simply open a db with hundreds of column families and ..., O(log N) data structure with the key/value semantics like orddict, gb_trees may be sufficient.
  • For a web front-end which open a db with hundreds of column families at startup and do something from thousands of erlang processes concurrently, ets may be suitable choice.

so providing a I/F for the most simplest use case and leave recovering (binding) to applications as needed is one of the right choice, I think.

Yet another choice I was thinking is providing a I/F which take a function for recovering (binding) CF handles like this.


%% Bind a column family handle with its name in this function,
%% So how to bind is up to users!
-type bind_fun() :: fun((ColumnName::binary(), ColumnHandle::any()) -> void()).

-spec(open(Name, DBOpts, CFDescriptors, BindFunc) ->
             {ok, db_handle()} | {error, any()}
               when Name::file:filename_all(),
                    DBOpts :: db_options(),
                    CFDescriptors :: list(#cf_descriptor{}),
                    BindFunc :: bind_fun()).

For now, I'd stick with the first most simplest one As a list in the same order ( vector in the same order in C++ ). Thoughts?

I will have some cycle this we to do it so let me know :)

thanks :)

mocchira avatar Jan 27 '16 06:01 mocchira

@mocchira I am not sure we need another functions though. IImo just open() with the cfs option should be ok and more erlangish. though that's a minor detail.

Also do we really need to create this ColumnFamilyDescriptor ? Why not keeping it in the nif context to avoid to switch the contexts too often? Passing a tuple like {Name, ColumnFamillyOptions} would be enough. Especially that this ColumnFamilyDescriptor is only use once in the open function. Thoughts?

About the function to pass to this function to bind the handle to a custom data structure, that's really a good idea. I will work around that.

benoitc avatar Jan 27 '16 21:01 benoitc

@benoitc

I am not sure we need another functions though. IImo just open() with the cfs option should be ok and > more erlangish. though that's a minor detail.

Got it. Now we have 2 open functions. The first one open/3 is for opening a db with the default column family, The second open_with_cf/3 is for opening a db with the explicitly specified column families. Since the first one has been left for backward compatibility, we want to keep that I/F as possible.

Also do we really need to create this ColumnFamilyDescriptor ? Why not keeping it in the nif context to avoid to switch the contexts too often? Passing a tuple like {Name, ColumnFamillyOptions} would be enough. Especially that this ColumnFamilyDescriptor is only use once in the open function. Thoughts?

either is ok to me. so please go ahead with the tuple way!

About the function to pass to this function to bind the handle to a custom data structure, that's really a good idea. I will work around that.

thanks!

In summary,

  • open/3 (hold)

%% Open with the default column family.
-spec(open(Name, DBOpts, CFOpts) ->
             {ok, db_handle()} | {error, any()} when Name::file:filename_all(),
                                                     DBOpts::db_options(),
                                                     CFOpts::cf_options()).
  • open/4 (the new one)
%% Used for open/4 to bind a column family name with its option
-type cf_descriptor :: {CFName::binary(), CFOpts::cf_options()}.

%% Used for open/4 to bind a column family handle with its name in this function
-type bind_fun() :: fun((CFName::binary(), CFHandle::cf_handle()) -> void()).

%% Open with the explicitly specified column families
-spec(open(Name, DBOpts, CFDescriptors, BindFunc) ->
             {ok, db_handle()} | {error, any()}
               when Name::file:filename_all(),
                    DBOpts :: db_options(),
                    CFDescriptors :: list(cf_descriptor()),
                    BindFunc :: bind_fun()).

Please correct me if there are something differences from what you think.

mocchira avatar Jan 28 '16 04:01 mocchira

@mocchira yes this si exactly what I was thinking of, thanks for the summary :) If it's OK i will work on that. I should have a patch until monday :)

benoitc avatar Jan 28 '16 09:01 benoitc

great to hear that! ok, please go ahead. we are looking forward to your PR! 2016/01/28 18:12 "Benoit Chesneau" [email protected]:

@mocchira https://github.com/mocchira yes this si exactly what I was thinking of, thanks for the summary :) If it's OK i will work on that. I should have a patch until monday :)

— Reply to this email directly or view it on GitHub https://github.com/leo-project/erocksdb/issues/3#issuecomment-176076587.

mocchira avatar Jan 28 '16 09:01 mocchira