erocksdb
erocksdb copied to clipboard
all: Impl RocksDB specific features
- open_with_cf
- list_column_families
- create_column_family
- drop_column_family
- put/5 (with cf)
- delete/4 (with cf)
- get/4 (with cf)
- iterator_with_cf
- count/2 (with cf)
- status/3 (with cf)
what is the status of these features?
We do not implement them yet because LeoFS' libs do not use their functions.
Actually, we have been working on another project, Astra - a distributed database (not OSS). If we need their functions in Astra, we'll implement them. In next week, we're planning to discuss whether we implement them or not, then I'll get in touch with you.
Hrm OK. On a side node it's confusing to have the API already in the source even if it raises some error. Why not simply removing it for now so we don't have to read extra code until it's done ?
@yosukehara I would like to implement that change but not ure about the current API. You will notice that the rocksdb API to use column has slightly changed since:
https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L150
The thing I am not sure is if I need to pass to the get and other functions a ColumnFamily handle or only the name of the column and keep the handle in the DbObject instance instead using a map.
Ex:
CfOptions = [],
Cf = erocksdb:column_family(mycf, CfOptions),
{ok, Db} = erocksdb:open("path", Options, [Cf]),
{ok, Val} = erocksdb:get(Db, mycf, "somekey", []);
..
The column name will be an atom, The atom will be used to retrieve the column family handle from the database.
Another way would be retrieving ColumnHandles in a map like in the rust binding:
CfOptions = [],
Cf = erocksdb:column_family("mycf," CfOptions),
{ok, Db, CfHandles} = erocksdb:open("path", Options, [Cf]),
{ok, Val} = erocksdb:get(Db, maps:get("mycf", CfHandles), "somekey", []);
..
But it will imply to remove the support for erlang < 18. Also this API is more verbose.
Thoughts?
cc @mocchira
@benoitc Thank you for your comments. I asked @mocchira to communicate with you to improve erocksdb
.
@benoitc Thank you for coming here! Let's discuss about what the erlang I/F should be.
At first, Let me clarify some points.
You will notice that the rocksdb API to use column has slightly changed since:
I've checked the diff include/rocksdb/db.h
between 3.11 and 4.1,
git diff v3.11 rocksdb-4.1 include/rocksdb/db.h
but seems there is no changes affecting our predefined ( return {error, not_implemented}
) column family related I/F.
Could you elaborate that?
The thing I am not sure is if I need to pass to the get and other functions a ColumnFamily handle or ?only the name of the column and keep the handle in the DbObject instance instead using a map.
IMHO as I said at the another PR, We'd like to basically keep I/F consistent with RocksDB's one. ( this rule may give us opportunities to generate erlang I/F automatically according to the original I/F when new I/F will be added ) So Passing a ColumnFamily handle is the first choice for me.
but I've thought
the current predefined I/F could be improved more such as naming suffixed by _with_cf
to avoid fun signature conflicts are awkward to me..
So any suggestions are welcome!
As a side note,
The column name will be an atom
Since there is no limit about the number of column families on rocksdb, atom should be avoided.
But it will imply to remove the support for erlang < 18. Also this API is more verbose.
AFAIK, Since the minimum version of erlang our users have used is R16B03-1, map also should be avoided.
@mocchira I wasn't sure about the open_with_cf, I don't see it DB.h
. Also the main question is how do you recover the ColumnFamily handles once you open. As a list in the same order? As a proplists?
I will have some cycle this we to do it so let me know :)
@benoitc Sorry for the late reply.
I wasn't sure about the open_with_cf, I don't see it DB.h
https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L150
static Status Open(const DBOptions& db_options, const std::string& name,
const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles, DB** dbptr);
is corresponding with https://github.com/leo-project/erocksdb/blob/develop/src/erocksdb.erl#L195
-spec(open_with_cf(Name, DBOpts, CFDescriptors) ->
{ok, db_handle(), list(cf_handle())} | {error, any()}
when Name::file:filename_all(),
DBOpts :: db_options(),
CFDescriptors :: list(#cf_descriptor{})).
Also the main question is how do you recover the ColumnFamily handles once you open. As a list in the same order? As a proplists?
good point. I think the optimal choice how to recover ( associate | bind ) ColumnFamily handles depends on the application ( library users ) context. For example,
- For a batch program which simply open a db with dozen of column families and do something and finally close,
a list in the same order
O(N) may be sufficient. - For a batch program which simply open a db with hundreds of column families and ..., O(log N) data structure with the key/value semantics like
orddict
,gb_trees
may be sufficient. - For a web front-end which open a db with hundreds of column families at startup and do something from thousands of erlang processes concurrently,
ets
may be suitable choice.
so providing a I/F for the most simplest use case and leave recovering (binding) to applications as needed is one of the right choice, I think.
Yet another choice I was thinking is providing a I/F which take a function for recovering (binding) CF handles like this.
%% Bind a column family handle with its name in this function,
%% So how to bind is up to users!
-type bind_fun() :: fun((ColumnName::binary(), ColumnHandle::any()) -> void()).
-spec(open(Name, DBOpts, CFDescriptors, BindFunc) ->
{ok, db_handle()} | {error, any()}
when Name::file:filename_all(),
DBOpts :: db_options(),
CFDescriptors :: list(#cf_descriptor{}),
BindFunc :: bind_fun()).
For now,
I'd stick with the first most simplest one As a list in the same order
( vector in the same order in C++ ).
Thoughts?
I will have some cycle this we to do it so let me know :)
thanks :)
@mocchira I am not sure we need another functions though. IImo just open() with the cfs option should be ok and more erlangish. though that's a minor detail.
Also do we really need to create this ColumnFamilyDescriptor
? Why not keeping it in the nif context to avoid to switch the contexts too often? Passing a tuple like {Name, ColumnFamillyOptions}
would be enough. Especially that this ColumnFamilyDescriptor
is only use once in the open function. Thoughts?
About the function to pass to this function to bind the handle to a custom data structure, that's really a good idea. I will work around that.
@benoitc
I am not sure we need another functions though. IImo just open() with the cfs option should be ok and > more erlangish. though that's a minor detail.
Got it.
Now we have 2 open functions.
The first one open/3
is for opening a db with the default column family,
The second open_with_cf/3
is for opening a db with the explicitly specified column families.
Since the first one has been left for backward compatibility,
we want to keep that I/F as possible.
Also do we really need to create this ColumnFamilyDescriptor ? Why not keeping it in the nif context to avoid to switch the contexts too often? Passing a tuple like {Name, ColumnFamillyOptions} would be enough. Especially that this ColumnFamilyDescriptor is only use once in the open function. Thoughts?
either is ok to me. so please go ahead with the tuple way!
About the function to pass to this function to bind the handle to a custom data structure, that's really a good idea. I will work around that.
thanks!
In summary,
-
open/3
(hold)
%% Open with the default column family.
-spec(open(Name, DBOpts, CFOpts) ->
{ok, db_handle()} | {error, any()} when Name::file:filename_all(),
DBOpts::db_options(),
CFOpts::cf_options()).
-
open/4
(the new one)
%% Used for open/4 to bind a column family name with its option
-type cf_descriptor :: {CFName::binary(), CFOpts::cf_options()}.
%% Used for open/4 to bind a column family handle with its name in this function
-type bind_fun() :: fun((CFName::binary(), CFHandle::cf_handle()) -> void()).
%% Open with the explicitly specified column families
-spec(open(Name, DBOpts, CFDescriptors, BindFunc) ->
{ok, db_handle()} | {error, any()}
when Name::file:filename_all(),
DBOpts :: db_options(),
CFDescriptors :: list(cf_descriptor()),
BindFunc :: bind_fun()).
Please correct me if there are something differences from what you think.
@mocchira yes this si exactly what I was thinking of, thanks for the summary :) If it's OK i will work on that. I should have a patch until monday :)
great to hear that! ok, please go ahead. we are looking forward to your PR! 2016/01/28 18:12 "Benoit Chesneau" [email protected]:
@mocchira https://github.com/mocchira yes this si exactly what I was thinking of, thanks for the summary :) If it's OK i will work on that. I should have a patch until monday :)
— Reply to this email directly or view it on GitHub https://github.com/leo-project/erocksdb/issues/3#issuecomment-176076587.