node-lmdb icon indicating copy to clipboard operation
node-lmdb copied to clipboard

Supporting larger integer keys

Open nwrbs opened this issue 7 years ago • 29 comments

Currently, the max supported node-lmdb integer key is a 4 byte integer (keyIsUint32), which yields a little over 4 billion values, which may be enough, but I'm a bit concerned about being limited... I realize the maximum integer supported by the Javascript number is 2^53 -1. However that number is orders of magnitude larger than a Uint32. Have you considered adding a Uint53 (UintJSMax or something similar). I realize from the C side, it will still have to be stored as a 64bit integer, but the lost 11 bits, at least for me, is preferable to having to use a string or buffer. I couldn't tell for sure, but isn't the integer key stored at a native 64 bit int by lmdb anyways?

Just a thought. Thanks!

nwrbs avatar Aug 29 '17 18:08 nwrbs

@nwbrad You can use a Buffer as key, which can be as big as you wish.

Venemo avatar Aug 30 '17 00:08 Venemo

Sure, I understand that, but it does not process the same way a 8 byte integer does. Integer key gets are substantially faster since the comparison math can be done in a single cpu cycle. Howard Chu talked about it in one of his presentations.

Just a thought.

nwrbs avatar Aug 30 '17 01:08 nwrbs

@nwbrad I will think about what the best way is to implement this. Maybe there could be a special buffer key type which only allows 64-bit buffers, and I could enable the integer key optimization for that. What do you think?

Venemo avatar Sep 07 '17 11:09 Venemo

I'm not sure about the best way. Personally, I think it should stick to as close to the C LMDB implementation as possible, which would mean the using a Uint64. A Uint64 would work as long as it was clearly documented that JS has a limit of a 53 bit integer, which means that although it is stored as a 64 bit int, that functional JS limit is a 53 integer. That's why I thought something like UintJSMax as a name would work well, since its unusual name would force people to read about it...

For the buffer method, you would use a JS number, convert it to a 64bit buffer then store that as a UInt64? That could work, but it would likely be more confusing to users and would probably be more difficult to manage.

nwrbs avatar Sep 07 '17 15:09 nwrbs

@nwbrad I would not use a UInt64, because that would prevent users that use actual 64-bit numbers in other LMDB applications from using their existing databases with node-lmdb.

About the buffer idea:
The buffer would not need to be converted. The new keytype would enforce the use of a 64-bit (8-byte) buffer as a key, and would enable MDB_INTEGERKEY, and that's all there is to it.

Venemo avatar Sep 07 '17 18:09 Venemo

Sounds like it could work, especially if the buffer would not need to be converted and the values can be managed via a normal JS number variable.

nwrbs avatar Sep 08 '17 02:09 nwrbs

The values cannot be managed via a normal JS number variable because of what you said - it doesn't support 64-bit values properly.

Venemo avatar Sep 08 '17 09:09 Venemo

Yeah, I was afraid of that. What I needed was the ability to have number keys, managed as numbers, into the trillions, which would not work with a 32 integer... Not supporting 64bit integers is one of those unfortunate parts of javascript. I was hoping the solution would be able to utilize the ES6 Number.MAX_SAFE_INTEGER (2^53 -1) ~ Number.isSafeInteger() [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]

Thanks for looking at it.

nwrbs avatar Sep 10 '17 17:09 nwrbs

@nwbrad Why do you not want to work with buffers?

Venemo avatar Sep 11 '17 05:09 Venemo

My main apprehension to using buffers is the requirement to change the way a working application is currently functioning. Adding a new, untested methodology storing existing keys and the overhead of converting to/from numbers to buffers concerns me. I guess I look at changing the key type and reloading the data as infinitely safer than having to changing all existing javascript code that accesses LMDB.

nwrbs avatar Sep 11 '17 17:09 nwrbs

@nwbrad Why is it untested? Why would you need to change all existing javascript code?

Venemo avatar Sep 12 '17 09:09 Venemo

We use integer keys as unique identifiers for multiple databases (10+) and they are shown/used extensively throughout our application. By untested, I mean that the use of the new method in our code is untested and is not a drop-in replacement. So all aspects of our code would be untested with a new methodology. Switching to a JS safe large integer would require almost no changes to code, it would just be changing the key type and reloading the data.

nwrbs avatar Sep 12 '17 15:09 nwrbs

@nwbrad Well if you think that something like that 53-bit integer would help, I could implement that for you.

Venemo avatar Sep 13 '17 11:09 Venemo

@nwbrad However it won't be a drop-in replacement because then the new database will be binary incompatible with the old one.

Venemo avatar Sep 13 '17 12:09 Venemo

Venemo, Thank you for the offer, but I'm not sure its worth your effort if it is not binary compatible. I don't want you to have to go to the effort if it is not extending the existing base. My hope was for a enhancement of the current version that just offer a JS limited 64 bit integer support (within the JS Safe Integer boundaries) with everything else being the same.

nwrbs avatar Sep 13 '17 17:09 nwrbs

@nwbrad It is possible, but then you will have 32-bit and 64-bit keys mixed in the same database, so you will need to disable the MDB_INTEGERKEY optimization. Then it can be binary compatible with your old database, but add new keys as 64-bit (or 53-bit) integers.

Venemo avatar Sep 13 '17 18:09 Venemo

Ok. I completely misunderstood you. I thought you meant I could not use the same binary file (node-lmdb module) to access 32 bit or 64 bit integer keyed data files.. I, personally, would not expect to have 32-bit and 64-bit keys in the same database. For me, this should not be a limitation.

Right or wrong, with LMDB I find I create multiple databases to help minimize the impact of having a single write lock per database and to improve optimization of data locality. I know it adds a bit to the memory overhead, but I think it's worth it.

I believe the change to allow 64-bit (or 53-bit) integers is worth it. Thank you!

nwrbs avatar Sep 13 '17 19:09 nwrbs

@nwbrad I said the new database (with 64-bit keys) would not be binary compatible with the old one (with 32-bit keys). I wasn't talking about the module itself.

Venemo avatar Sep 14 '17 07:09 Venemo

Yep. My mistake, I misread it.

nwrbs avatar Sep 14 '17 15:09 nwrbs

@nwbrad Are you still interested in this? I've looked at the V8 docs, but it seems it doesn't have a suitable data type for storing these keys: https://v8docs.nodesource.com/node-8.9/dc/d0a/classv8_1_1_value.html

Venemo avatar Mar 25 '18 17:03 Venemo

The V8 Integer::New only takes a 32-bit number, unfortunately (even though it returns its value in 64-bit).

Venemo avatar Mar 25 '18 18:03 Venemo

Yes, I'm still interested. I want V8/Javascript with LMDB to be able to use the Javascript number key with a maximum value of MAX_SAFE_INTEGER (a 53 bit integer). Are you saying that V8/Node does not support numbers up to MAX_SAFE_INTEGER as a number?
I was expecting something like JS number (up to MAX_SAFE_INTEGER) translates to LMDB C 64 bit integer and reverse. Is that not possible? I don't think it would need to be stored as an INT in JS, just a number... Am I missing something.

nwrbs avatar Mar 25 '18 18:03 nwrbs

@nwbrad Unfortunately. Some digging also turned up this:
https://github.com/thlorenz/v8-perf/issues/3#issuecomment-45545604

Relevant quote:

V8 integer sizes:

  • In 64-bit up to 32-bit signed
  • In 32-bit up to 31-bit signed

Outside these ranges, numbers are represented as boxed doubles (in special case like locals of optimized code, pure float arrays, doubles are stored immediately)

This finding concurs with what I found in the V8 docs.

So, we could perhaps get the 64-bit integer to/from the JavaScript side as a Number (which is in fact a double), but not sure if that is actually a good idea. Would we lose percision if we stored MAX_SAFE_INTEGER as a double?

Venemo avatar Mar 25 '18 18:03 Venemo

I think we would be fine representing int as a Number up to MAX_SAFE_INTEGER.

From mozilla: "The reasoning behind that number is that JavaScript uses double-precision floating-point format numbers as specified in IEEE 754 and can only safely represent numbers between -(253 - 1) and 253 - 1. Safe in this context refers to the ability to represent integers exactly and to correctly compare them"

MAX_SAFE_INTEGER-Mozilla

nwrbs avatar Mar 25 '18 20:03 nwrbs

@nwbrad Allright, I guess I can do it like this!

Venemo avatar Mar 26 '18 06:03 Venemo

Great, Thank you! I think it is will allow your tool to use LMDB to the limits of JavaScript. Maybe someday JavaScript will use true 64 bit integers, but until then I imagine this will be more than sufficient for 99.9% of users. I know it will resolve my concerns.

nwrbs avatar Mar 26 '18 15:03 nwrbs

I added a pull request that address this: https://github.com/Venemo/node-lmdb/pull/132

erichocean avatar Jun 01 '18 15:06 erichocean

@Venemo What if we implement this with the new bigint type? The logic would be if the key type is bigint, make it a binary buffer, but if it's a bigint with the db keytype set to integer via an additional option, try to fit the bigint in 64bits. If it can't fit fail the insert with a descriptive message of some sort? I could create a PR for this if you agree with the logic above

matthewaveryusa avatar Feb 18 '19 03:02 matthewaveryusa

@matthewaveryusa Sounds good, though then the feature needs to be #ifdefed in such a way that it only works on node versions that support bigint. Looking forward to the PR.

Venemo avatar Feb 18 '19 08:02 Venemo