lmdbxx Cursor.get returns key and value concatenated into key variable

Cursor.get returns key and value concatenated into key variable

Open DrissiReda opened this issue 5 years ago • 10 comments

Weird output from cursor get: I would get both key and value inside the key variable, then the value inside the value variable :

#include <cstdio>
#include <cstdlib>
#include <lmdb++.h>
using namespace lmdb;
int getsize(const lmdb::env& e){
    auto t = lmdb::txn::begin(e.handle(), nullptr, MDB_RDONLY);
    auto d = lmdb::dbi::open(t, nullptr);
    int r=d.size(t);
    t.abort();
    return r;
}


int main() {
  auto env = lmdb::env::create();
  env.set_mapsize(1UL * 1024UL * 1024UL * 1024UL); /* 1 GiB */
  env.open("./example.mdb", 0, 0664);
  {
    auto wtxn = lmdb::txn::begin(env);
    auto dbi = lmdb::dbi::open(wtxn, nullptr);
    char a[6] = "hello";
    dbi.put(wtxn, "email", "hello");
    dbi.put(wtxn, "key", "value");
    dbi.put(wtxn, "user", "johndoe");
    wtxn.commit();
  }
  {
      auto rtxn = lmdb::txn::begin(env);
      auto dbi = lmdb::dbi::open(rtxn, nullptr);
      auto cursor = lmdb::cursor::open(rtxn, dbi);
      lmdb::val k, v;
      while(cursor.get(k, v, MDB_NEXT)){
        printf("We got '%s'\nValue '%s'\n", k.data(), v.data());
      }
  }
  {
    std::printf("size is %d\n", getsize(env));
  }
return EXIT_SUCCESS;
}

Expected Output:

We got 'email'
Value 'hello'
We got 'key'
Value 'value'
We got 'user'
Value 'johndoe'
size is 3

Output :

We got 'emailhello'
Value 'hello'
We got 'keyvalue'
Value 'value'
We got 'userjohndoe'
Value 'johndoe'
size is 3

Mar 20 '19 08:03 DrissiReda

It looks like you are assuming your values will be NUL-terminated:

printf("We got '%s'\nValue '%s'\n", k.data(), v.data());

But your puts above aren't writing the NUL byte.

Don't use the C string routines since you won't be able to store NUL bytes in your keys/values. Instead do something like:

std::string myKey(k.data(), k.size());
std::string myValue(v.data(), v.size());
std::cout << "We got " << myKey << " and " << myValue << std::endl;

(untested)

Or, better yet, upgrade to C++17 and use my fork and get the string_view hotness :)

Mar 20 '19 14:03 hoytech

I know about string_view and about your fork, but I have constraint that prevent me from going above c++11. Since you're here, could you tell me how can I force my database to sorted by input order (instead of lexicographically). Also on first creation of database, it always crashes with "bus_error", but if I open the environment, then close it then open it again and execute my code, it works.

EDIT: How can I make sure my puts do add the null byte, in order to avoid this problem?

Mar 20 '19 15:03 DrissiReda

Make it so your keys increase every time you insert, or use a secondary index
I don't know, I'd need code to reproduce it. Make sure you are creating a big enough MAPSIZE, and the same mapsize each time. Make sure you commit the transaction that creates the tables.
I suggest not storing the NUL byte. It's a waste of space since LMDB tracks size anyway. Furthermore, it indicates you are unable to store NUL bytes as part of your keys or values (as I described above)

Mar 20 '19 15:03 hoytech

How can I use a secondary index?

Mar 20 '19 18:03 DrissiReda

Every time you insert into your main table, you also insert into another table. In this secondary table, the key is an increasing integer, and the value is the key into your main table.

Then when you want to iterate over your items in insertion order, iterate through the secondary index. For each value, use it as a key to look up the item from the main table.

Mar 20 '19 20:03 hoytech

I'm sorry for all these questions but I can't find where else to ask them, is it possible to get the position of an entry without iterating through my database with a cursor and then incrementing a counter?

Mar 22 '19 13:03 DrissiReda

It's OK. What do you mean get the position? You can position the cursor directly to a know key with cursor ops like MDB_SET, or go to somewhere nearby with MDB_SET_RANGE.

If you mean get an item by its postion index (ie, get the 10th element in the DB) then afaik this is not possible. If you're always appending to the DB (and never removing or inserting in the middle) then you could maintain the position index in a secondary index.

Mar 22 '19 13:03 hoytech

I meant, doing a dbi.get() to search for the position index of a certain key, (e.g inputting "user_email" and receiving 10). Is this possible without holding duplicate databases with secondary indexes?

Mar 22 '19 14:03 DrissiReda

AFAIK, no.

Mar 22 '19 14:03 hoytech

I'll see which one would be more cost effective, iterating through the database and counting, or keeping a duplicate of each concerned database with indices.

Mar 23 '19 19:03 DrissiReda

lmdbxx lmdbxx copied to clipboard

Cursor.get returns key and value concatenated into key variable

lmdbxx
lmdbxx copied to clipboard