node-lmdb icon indicating copy to clipboard operation
node-lmdb copied to clipboard

Support non-zero-terminated UTF-16 strings in node-lmdb

Open hanweifish opened this issue 6 years ago • 15 comments

For the lmdb file we use, when I use 'cursor.goToFirst()', I always get fatal: Invalid zero-terminated UTF-16 string stack=Error: Invalid zero-terminated UTF-16 string

however, my lmdb file works in python lmdb and go lmdb library, I can get string and iterator without errors.

hanweifish avatar May 17 '18 00:05 hanweifish

@hanweifish Please provide the exact steps to reproduce this problem. You get that error when you try to read a UTF-16 string and it's not zero terminated, but goToFirst does not throw that.

Venemo avatar May 17 '18 06:05 Venemo

when I iterator our lmdb, on the line shows in the error log: for (var found = cursor.goToFirst(); found !== null; found = cursor.goToNext()) { console.log('key', found); console.log('value', txn.getString(dbi, found)); }

if the db is based on UTF-16 string and it's not zero terminated, how to solve the problem? Since I test with go and python lmdb library, both works well with the same lmdb file.

hanweifish avatar May 17 '18 06:05 hanweifish

@hanweifish In that case the data is not a valid zero-terminated UTF-16 string.

Venemo avatar May 17 '18 06:05 Venemo

@hanweifish Explanation: in the beginnings it was unclear whether V8 requires zero termination on its strings or not, so I added the terminating zero just in case. These days V8 is much better documented and it's clear that the terminator is not necessary. However if I removed it now, that would mean I'd break it for everyone who created their database on older versions of node-lmdb.

So, sadly, this isn't a simple question. I will think about how best to solve this.

Venemo avatar May 17 '18 06:05 Venemo

Until this is solved, I can give you a temporary solution:

var cursor = new lmdb.Cursor(txn, dbi, { keyIsBuffer: true });
for (var key = cursor.goToFirst(); key !== null; key = cursor.goToNext()) {
    var keyString = buf.toString('utf16');
    console.log('key', keyString);
    console.log('value', txn.getBinary(dbi, key).toString('utf16'));
}

Just typed into the comment window, so there might be typos in there. Good luck!

Venemo avatar May 17 '18 06:05 Venemo

Use the keyIsBinary parameter to read/write keys as buffers. Convert the buffer to a string and all will be fine.

da77a avatar May 17 '18 06:05 da77a

@da77a Yep, that's exactly what I suggested. :)

Venemo avatar May 17 '18 07:05 Venemo

On the longer run, though, I will add an option that will allow the use of non zero terminated strings as well.

Venemo avatar May 17 '18 07:05 Venemo

Thanks. Works as buffers. Finally!

But has another question about encoding, in our lmdb,

key <Buffer@0x102224540 6c 6f 63 61 6c 65 73> toString() we get=> key locales

for this Buffer, how can I know the encoding? Thus I can generate the buffer from key via new Buffer(). I tried every encoding but no one match it~~

Thanks

hanweifish avatar May 17 '18 07:05 hanweifish

@hanweifish Generally, you are expected to know what encoding you store your strings with.

Venemo avatar May 17 '18 11:05 Venemo

Ha, Thanks for your help.

hanweifish avatar May 17 '18 17:05 hanweifish

Let's keep this issue open for now, to keep track of non-zero-terminated strings support.

Venemo avatar May 18 '18 07:05 Venemo

You can bump the major version of this library. If you are following strict 'semver', then you have the freedom to break API until version 1.x -- you are currently still in 0.x so breaking this is OK -- but I do understand the reluctance to break current clients.

matthewaveryusa avatar Feb 18 '19 03:02 matthewaveryusa

@matthewaveryusa I know I could do whatever I want, but I don't want to give a hard time to my users, is all.

Venemo avatar Feb 18 '19 08:02 Venemo

I ran into this same problem when trying to get the database names from the unnamed database. Setting keyIsBuffer to true and treating the key as a buffer fixes it!

var dbi = env.openDbi({name: null}); // Main DB
var txn = env.beginTxn();
// Iterate through database names
var cursor = new lmdb.Cursor(txn, dbi, { keyIsBuffer: true });
for (var found = cursor.goToFirst(); found !== null; found = cursor.goToNext()) {
	console.log("Database Name:", found.toString());
}
cursor.close();
txn.commit();

DavidPesta avatar May 24 '20 18:05 DavidPesta