node-lmdb
node-lmdb copied to clipboard
Support non-zero-terminated UTF-16 strings in node-lmdb
For the lmdb file we use, when I use 'cursor.goToFirst()', I always get fatal: Invalid zero-terminated UTF-16 string stack=Error: Invalid zero-terminated UTF-16 string
however, my lmdb file works in python lmdb and go lmdb library, I can get string and iterator without errors.
@hanweifish Please provide the exact steps to reproduce this problem. You get that error when you try to read a UTF-16 string and it's not zero terminated, but goToFirst
does not throw that.
when I iterator our lmdb, on the line shows in the error log: for (var found = cursor.goToFirst(); found !== null; found = cursor.goToNext()) { console.log('key', found); console.log('value', txn.getString(dbi, found)); }
if the db is based on UTF-16 string and it's not zero terminated, how to solve the problem? Since I test with go and python lmdb library, both works well with the same lmdb file.
@hanweifish In that case the data is not a valid zero-terminated UTF-16 string.
@hanweifish Explanation: in the beginnings it was unclear whether V8 requires zero termination on its strings or not, so I added the terminating zero just in case. These days V8 is much better documented and it's clear that the terminator is not necessary. However if I removed it now, that would mean I'd break it for everyone who created their database on older versions of node-lmdb.
So, sadly, this isn't a simple question. I will think about how best to solve this.
Until this is solved, I can give you a temporary solution:
var cursor = new lmdb.Cursor(txn, dbi, { keyIsBuffer: true });
for (var key = cursor.goToFirst(); key !== null; key = cursor.goToNext()) {
var keyString = buf.toString('utf16');
console.log('key', keyString);
console.log('value', txn.getBinary(dbi, key).toString('utf16'));
}
Just typed into the comment window, so there might be typos in there. Good luck!
Use the keyIsBinary parameter to read/write keys as buffers. Convert the buffer to a string and all will be fine.
@da77a Yep, that's exactly what I suggested. :)
On the longer run, though, I will add an option that will allow the use of non zero terminated strings as well.
Thanks. Works as buffers. Finally!
But has another question about encoding, in our lmdb,
key <Buffer@0x102224540 6c 6f 63 61 6c 65 73> toString() we get=> key locales
for this Buffer, how can I know the encoding? Thus I can generate the buffer from key via new Buffer(). I tried every encoding but no one match it~~
Thanks
@hanweifish Generally, you are expected to know what encoding you store your strings with.
Ha, Thanks for your help.
Let's keep this issue open for now, to keep track of non-zero-terminated strings support.
You can bump the major version of this library. If you are following strict 'semver', then you have the freedom to break API until version 1.x -- you are currently still in 0.x so breaking this is OK -- but I do understand the reluctance to break current clients.
@matthewaveryusa I know I could do whatever I want, but I don't want to give a hard time to my users, is all.
I ran into this same problem when trying to get the database names from the unnamed database. Setting keyIsBuffer to true and treating the key as a buffer fixes it!
var dbi = env.openDbi({name: null}); // Main DB
var txn = env.beginTxn();
// Iterate through database names
var cursor = new lmdb.Cursor(txn, dbi, { keyIsBuffer: true });
for (var found = cursor.goToFirst(); found !== null; found = cursor.goToNext()) {
console.log("Database Name:", found.toString());
}
cursor.close();
txn.commit();