ijson
ijson copied to clipboard
No support for arbitrary precision?
serde_json includes an arbitrary_precision
feature flag which enables support for large integers. It seems an equivalent isn't available here, making i64
the largest possible integer
I ran into this while trying to use RedisJSON. Posted an issue there originally, but after tracking down a bit more seems the underlying problem is lack of support here
https://github.com/RedisJSON/RedisJSON/issues/869#issuecomment-1321193081
Would it be possible to implement something similar here?
It's possible in principle. The issue is that arbitrary precision integers are not natively supported by the serde data model. Serde-json works around this using what can only be described as a hack. Some details of that hack are not publicly exposed from serde-json, so we cannot make this work without duplicating that code here, and relying on implementation details of serde-json.
See uses of https://github.com/serde-rs/json/blob/993e7a6eeaa89b39329d245e63879d7913ed1a41/src/number.rs#L18
yep. I did see that file. It felt like one big "if/else" with that feature flag splattered all over
but are you saying a similar approach would be acceptable here? I would be willing to try and hack it myself if so
Yes, but you'll need to uphold the existing guarantees of the INumber
type. (eg. that you can do comparisons between two numbers regardless of their internal representation)
Just to add support to this: our use case is a bit different namely that we want the exact string that is present in the input. We have our own mechanism for parsing decimals from strings & internally representing them in our application. Having them as f64 requires us to first convert it back to a string & then parse it - which means we're losing both precision & performance without this feature. As for the implementation, what would it take from serde-json to support this? I'm not sure how this works but is it not possible to just store the input as a string inside a new variant, perhaps copying a lot of functionality if required?
The way INumber
works in ijson
(you can read the README for more detail on this) is that it uses a number of different representations for numbers internally whilst hiding that from the user (ie. the API is the same regardless).
At the moment, the implementation first splits inputs into those with a decimal point, and those without. Numbers with a decimal point are always represented using an f64
, whilst numbers without will an integer type depending on their size.
In order to support arbitrary precision, ijson would need to support a representation for both large integers and large decimal numbers. However, ideally it would still use a compact representation for small integers.
My suggestion would be to use a decodng algorithm like the following:
- Check if the input has a decimal point
- If so:
- Convert the input into a canonical representation (we want
10.0
and1.0e1
to be represented using the same string so that IValue comparisons are reasonably performant) - Store the canonicalized string, or an equivalent representation using base 10
- Store an extra flag indicating that the value originally has a decimal point (or else retain the decimal point in the canonical representation, but this will make equality comparisons slower)
- Convert the input into a canonical representation (we want
- Otherwise:
- Convert the input into a canonical representation (we want
10
and1e1
to be represented the same way) - Check if the input can be represented in a
u64
ori64
, and if so follow the existing decoding path for those types. - Otherwise, store the canonicalized string representation.
- Convert the input into a canonical representation (we want
We will need equality to still work regardless of representation, so 1.0
still needs to compare equal to 1
even if they are represented differently. We also need to handle the case where an INumber
is constructed directly from an f64
- in that case we can either implement some complicated logic to convert inputs in f64s only when they can be exactly represented, or else we can drop the f64
representation entirely when arbitrary precision is enabled, and convert f64
inputs into the closest representable JSON number (ie. decimal number).