ijson icon indicating copy to clipboard operation
ijson copied to clipboard

No support for arbitrary precision?

Open naps62 opened this issue 2 years ago • 5 comments

serde_json includes an arbitrary_precision feature flag which enables support for large integers. It seems an equivalent isn't available here, making i64 the largest possible integer

I ran into this while trying to use RedisJSON. Posted an issue there originally, but after tracking down a bit more seems the underlying problem is lack of support here

https://github.com/RedisJSON/RedisJSON/issues/869#issuecomment-1321193081

Would it be possible to implement something similar here?

naps62 avatar Nov 20 '22 19:11 naps62

It's possible in principle. The issue is that arbitrary precision integers are not natively supported by the serde data model. Serde-json works around this using what can only be described as a hack. Some details of that hack are not publicly exposed from serde-json, so we cannot make this work without duplicating that code here, and relying on implementation details of serde-json.

See uses of https://github.com/serde-rs/json/blob/993e7a6eeaa89b39329d245e63879d7913ed1a41/src/number.rs#L18

Diggsey avatar Nov 20 '22 23:11 Diggsey

yep. I did see that file. It felt like one big "if/else" with that feature flag splattered all over

but are you saying a similar approach would be acceptable here? I would be willing to try and hack it myself if so

naps62 avatar Nov 21 '22 09:11 naps62

Yes, but you'll need to uphold the existing guarantees of the INumber type. (eg. that you can do comparisons between two numbers regardless of their internal representation)

Diggsey avatar Nov 21 '22 11:11 Diggsey

Just to add support to this: our use case is a bit different namely that we want the exact string that is present in the input. We have our own mechanism for parsing decimals from strings & internally representing them in our application. Having them as f64 requires us to first convert it back to a string & then parse it - which means we're losing both precision & performance without this feature. As for the implementation, what would it take from serde-json to support this? I'm not sure how this works but is it not possible to just store the input as a string inside a new variant, perhaps copying a lot of functionality if required?

utkarshgupta137 avatar May 23 '23 18:05 utkarshgupta137

The way INumber works in ijson (you can read the README for more detail on this) is that it uses a number of different representations for numbers internally whilst hiding that from the user (ie. the API is the same regardless).

At the moment, the implementation first splits inputs into those with a decimal point, and those without. Numbers with a decimal point are always represented using an f64, whilst numbers without will an integer type depending on their size.

In order to support arbitrary precision, ijson would need to support a representation for both large integers and large decimal numbers. However, ideally it would still use a compact representation for small integers.

My suggestion would be to use a decodng algorithm like the following:

  • Check if the input has a decimal point
  • If so:
    • Convert the input into a canonical representation (we want 10.0 and 1.0e1 to be represented using the same string so that IValue comparisons are reasonably performant)
    • Store the canonicalized string, or an equivalent representation using base 10
    • Store an extra flag indicating that the value originally has a decimal point (or else retain the decimal point in the canonical representation, but this will make equality comparisons slower)
  • Otherwise:
    • Convert the input into a canonical representation (we want 10 and 1e1 to be represented the same way)
    • Check if the input can be represented in a u64 or i64, and if so follow the existing decoding path for those types.
    • Otherwise, store the canonicalized string representation.

We will need equality to still work regardless of representation, so 1.0 still needs to compare equal to 1 even if they are represented differently. We also need to handle the case where an INumber is constructed directly from an f64 - in that case we can either implement some complicated logic to convert inputs in f64s only when they can be exactly represented, or else we can drop the f64 representation entirely when arbitrary precision is enabled, and convert f64 inputs into the closest representable JSON number (ie. decimal number).

Diggsey avatar May 25 '23 11:05 Diggsey