node-lmdb
node-lmdb copied to clipboard
Supporting larger integer keys
Currently, the max supported node-lmdb integer key is a 4 byte integer (keyIsUint32), which yields a little over 4 billion values, which may be enough, but I'm a bit concerned about being limited... I realize the maximum integer supported by the Javascript number is 2^53 -1. However that number is orders of magnitude larger than a Uint32. Have you considered adding a Uint53 (UintJSMax or something similar). I realize from the C side, it will still have to be stored as a 64bit integer, but the lost 11 bits, at least for me, is preferable to having to use a string or buffer. I couldn't tell for sure, but isn't the integer key stored at a native 64 bit int by lmdb anyways?
Just a thought. Thanks!
@nwbrad You can use a Buffer
as key, which can be as big as you wish.
Sure, I understand that, but it does not process the same way a 8 byte integer does. Integer key gets are substantially faster since the comparison math can be done in a single cpu cycle. Howard Chu talked about it in one of his presentations.
Just a thought.
@nwbrad I will think about what the best way is to implement this. Maybe there could be a special buffer key type which only allows 64-bit buffers, and I could enable the integer key optimization for that. What do you think?
I'm not sure about the best way. Personally, I think it should stick to as close to the C LMDB implementation as possible, which would mean the using a Uint64. A Uint64 would work as long as it was clearly documented that JS has a limit of a 53 bit integer, which means that although it is stored as a 64 bit int, that functional JS limit is a 53 integer. That's why I thought something like UintJSMax as a name would work well, since its unusual name would force people to read about it...
For the buffer method, you would use a JS number, convert it to a 64bit buffer then store that as a UInt64? That could work, but it would likely be more confusing to users and would probably be more difficult to manage.
@nwbrad I would not use a UInt64, because that would prevent users that use actual 64-bit numbers in other LMDB applications from using their existing databases with node-lmdb.
About the buffer idea:
The buffer would not need to be converted. The new keytype would enforce the use of a 64-bit (8-byte) buffer as a key, and would enable MDB_INTEGERKEY
, and that's all there is to it.
Sounds like it could work, especially if the buffer would not need to be converted and the values can be managed via a normal JS number variable.
The values cannot be managed via a normal JS number variable because of what you said - it doesn't support 64-bit values properly.
Yeah, I was afraid of that. What I needed was the ability to have number keys, managed as numbers, into the trillions, which would not work with a 32 integer... Not supporting 64bit integers is one of those unfortunate parts of javascript. I was hoping the solution would be able to utilize the ES6 Number.MAX_SAFE_INTEGER (2^53 -1) ~ Number.isSafeInteger() [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]
Thanks for looking at it.
@nwbrad Why do you not want to work with buffers?
My main apprehension to using buffers is the requirement to change the way a working application is currently functioning. Adding a new, untested methodology storing existing keys and the overhead of converting to/from numbers to buffers concerns me. I guess I look at changing the key type and reloading the data as infinitely safer than having to changing all existing javascript code that accesses LMDB.
@nwbrad Why is it untested? Why would you need to change all existing javascript code?
We use integer keys as unique identifiers for multiple databases (10+) and they are shown/used extensively throughout our application. By untested, I mean that the use of the new method in our code is untested and is not a drop-in replacement. So all aspects of our code would be untested with a new methodology. Switching to a JS safe large integer would require almost no changes to code, it would just be changing the key type and reloading the data.
@nwbrad Well if you think that something like that 53-bit integer would help, I could implement that for you.
@nwbrad However it won't be a drop-in replacement because then the new database will be binary incompatible with the old one.
Venemo, Thank you for the offer, but I'm not sure its worth your effort if it is not binary compatible. I don't want you to have to go to the effort if it is not extending the existing base. My hope was for a enhancement of the current version that just offer a JS limited 64 bit integer support (within the JS Safe Integer boundaries) with everything else being the same.
@nwbrad It is possible, but then you will have 32-bit and 64-bit keys mixed in the same database, so you will need to disable the MDB_INTEGERKEY
optimization. Then it can be binary compatible with your old database, but add new keys as 64-bit (or 53-bit) integers.
Ok. I completely misunderstood you. I thought you meant I could not use the same binary file (node-lmdb module) to access 32 bit or 64 bit integer keyed data files.. I, personally, would not expect to have 32-bit and 64-bit keys in the same database. For me, this should not be a limitation.
Right or wrong, with LMDB I find I create multiple databases to help minimize the impact of having a single write lock per database and to improve optimization of data locality. I know it adds a bit to the memory overhead, but I think it's worth it.
I believe the change to allow 64-bit (or 53-bit) integers is worth it. Thank you!
@nwbrad I said the new database (with 64-bit keys) would not be binary compatible with the old one (with 32-bit keys). I wasn't talking about the module itself.
Yep. My mistake, I misread it.
@nwbrad Are you still interested in this? I've looked at the V8 docs, but it seems it doesn't have a suitable data type for storing these keys: https://v8docs.nodesource.com/node-8.9/dc/d0a/classv8_1_1_value.html
The V8 Integer::New
only takes a 32-bit number, unfortunately (even though it returns its value in 64-bit).
Yes, I'm still interested. I want V8/Javascript with LMDB to be able to use the Javascript number key with a maximum value of MAX_SAFE_INTEGER (a 53 bit integer). Are you saying that V8/Node does not support numbers up to MAX_SAFE_INTEGER as a number?
I was expecting something like JS number (up to MAX_SAFE_INTEGER) translates to LMDB C 64 bit integer and reverse. Is that not possible? I don't think it would need to be stored as an INT in JS, just a number... Am I missing something.
@nwbrad Unfortunately. Some digging also turned up this:
https://github.com/thlorenz/v8-perf/issues/3#issuecomment-45545604
Relevant quote:
V8 integer sizes:
- In 64-bit up to 32-bit signed
- In 32-bit up to 31-bit signed
Outside these ranges, numbers are represented as boxed doubles (in special case like locals of optimized code, pure float arrays, doubles are stored immediately)
This finding concurs with what I found in the V8 docs.
So, we could perhaps get the 64-bit integer to/from the JavaScript side as a Number (which is in fact a double), but not sure if that is actually a good idea. Would we lose percision if we stored MAX_SAFE_INTEGER
as a double?
I think we would be fine representing int as a Number up to MAX_SAFE_INTEGER.
From mozilla: "The reasoning behind that number is that JavaScript uses double-precision floating-point format numbers as specified in IEEE 754 and can only safely represent numbers between -(253 - 1) and 253 - 1. Safe in this context refers to the ability to represent integers exactly and to correctly compare them"
@nwbrad Allright, I guess I can do it like this!
Great, Thank you! I think it is will allow your tool to use LMDB to the limits of JavaScript. Maybe someday JavaScript will use true 64 bit integers, but until then I imagine this will be more than sufficient for 99.9% of users. I know it will resolve my concerns.
I added a pull request that address this: https://github.com/Venemo/node-lmdb/pull/132
@Venemo What if we implement this with the new bigint type? The logic would be if the key type is bigint, make it a binary buffer, but if it's a bigint with the db keytype set to integer via an additional option, try to fit the bigint in 64bits. If it can't fit fail the insert with a descriptive message of some sort? I could create a PR for this if you agree with the logic above
@matthewaveryusa Sounds good, though then the feature needs to be #ifdef
ed in such a way that it only works on node versions that support bigint. Looking forward to the PR.