json-ld-syntax icon indicating copy to clipboard operation
json-ld-syntax copied to clipboard

Elaborate on handling of JSON builtin types `integer` and `double`

Open VladimirAlexiev opened this issue 3 years ago • 11 comments

The spec doesn't describe explicitly enough what happens with JSON builtin types integer and double.

  • https://w3c.github.io/json-ld-syntax/#typed-values talks about 64-bit doubles and says "such as" without being exhaustive
  1. By using a native JSON type such as number, true, or false.
  • https://w3c.github.io/json-ld-syntax/#type-coercion talks about converting strings, not numerics
  • #335 describes an unpleasant situation where an integer (eg 3 instead of "3") cannot be used in URL
  • this example at the playground produces ex:pi "3.14E0"^^xsd:double (ok) but ex:two "2"^^xsd:integer ?!?!?
{"@context":{"@vocab":"http://example.org/"},
  "pi": 3.14, "two": 2.0000000000000001}
  • remove one decimal zero and you get ex:two "2.000000000000001E0"^^xsd:double which means the datatype varies with the lexical precision
  • In contrast, Turtle is consistent: 2.0 and 2.0000000000000001 mean xsd:decimal

@msporny @gkellogg I think the spec should be more explicit what implicit conversions are applied to JSON builtin types, and give some warnings about the examples above.

VladimirAlexiev avatar Feb 01 '22 07:02 VladimirAlexiev

As far as I can see, handling of JSON integers is not described in the spec.

Let's look at some examples on the playground:

  • 123456789012345678901 -> 123456789012345683968 xsd:integer: Whaaat? the trailing digits are totally wrong
  • 1234567890123456789.01 -> 1234567890123456768 xsd:integer. Whaaat? I typed a double
  • 12345678901234567.89012 -> 12345678901234568 xsd:integer. Whaaat? Double I said!
  • 1234567890123456789012 -> 1.234567890123457E21 xsd:double. Kind of ok, I guess.
  • -0.000001 -> -1.0E-6 xsd:double. ok
  • -0.0000001 -> -0 xsd:integer. WHAAAAT? Couldn't fit -1.0E-7 in a double?
    • I guess -0 is in the lexical space of https://www.w3.org/TR/xmlschema-2/#integer but the canonic value is 0
  • 0.0000001 -> 0 xsd:integer. WHAAAAT? Couldn't fit 1.0E-7 in a double?

xsd:integer is infinite precision, so I think the builtin JSON integers should be emitted as xsd:long?

The spec should warn: don't EVER use native JSON numbers, especially when it comes to large numbers. Use string as transfer format, and explicitly type them.

VladimirAlexiev avatar Feb 01 '22 08:02 VladimirAlexiev

The spec should warn: don't EVER use native JSON numbers, especially when it comes to large numbers.

Yes, this is a known issue in JSON and is elaborated in the JSON RFC in the section about Numbers:

https://datatracker.ietf.org/doc/html/rfc7159#section-6

You are not given any sort of precision guarantees by the JSON standard, just some vague handwaving on what might work. If you're shocked by what 64-bit computers do, you should see what JSON implementations written for 16-bit microcontrollers do to double values. :)

I do think it would be a good idea for the JSON-LD spec to warn against using numbers and doubles. We can't go as far as saying "don't EVER use native JSON numbers", because there are plenty of use cases where that's a legitimate thing to do.

msporny avatar Feb 01 '22 15:02 msporny

xsd:integer is infinite precision, so I think the builtin JSON integers should be emitted as xsd:long?

We can't do that, because it is possible to write a JSON processor that supports infinite precision (bounded only by memory). Welcome to our hell, @VladimirAlexiev. :)

msporny avatar Feb 01 '22 15:02 msporny

@VladimirAlexiev, did you look at steps 10 and 11 of the Object to RDF algorithm? I think the answer to your question is here.

The -0.0000001 and 0.0000001 cases in your example seem to be a bug of jsonld.js . The Ruby and Python implementations produce -1e-7 and 1e-7 as expected.

@msporny by the way, step 10 of said algorithm reads:

Otherwise, if value is a number with a non-zero fractional part (...) or an absolute value greater or equal to 1021, (...) convert value to a string in canonical lexical form of an xsd:double

so I don't believe that an implementation producing arbitrary large integers would be compliant.

pchampin avatar Feb 01 '22 15:02 pchampin

Additionally, the 10^21 and other related spec text was informed by the ECMAScript spec: https://tc39.es/ecma262/#sec-numeric-types-number-tostring as well as RFC 8785 and RFC 7493. Getting interop and spec text "right" with numbers in JSON has been historically challenging and there have been many debates concerning practicality vs. mathematical expression.

dlongley avatar Feb 01 '22 16:02 dlongley

Ok guys, add whatever provisos and warnings you see fit in the spec, but warn poor folks to be very careful when using builtin JSON numbers, especially for large and small numbers (by absolute value).

I have no idea what is the internal representation of 123456789012345678901 in various JS, ECMA etc implementations (or indeed, little desire to learn). But when that's converted to "123456789012345683968"^^xsd:integer (an infinite precision datatype) and the output differs by 67 from the input, that makes me lose faith in the numeric aspects of XSD, RDF, JSONLD.

Java is better, seems to use BigNums (jena riot, jsonld-java):

$ echo '{"@context":{"@vocab":"http://example.org/"},"num":1234567890123456789012345678.90}' | riot -syntax jsonld -out ttl -
_:b0    <http://example.org/num>  1.2345678901234569E27 .
$ echo '{"@context":{"@vocab":"http://example.org/"},"num":123456789012345678901234567890}' | riot -syntax jsonld -out ttl -
_:b0    <http://example.org/num>  123456789012345678901234567890 .

VladimirAlexiev avatar Feb 01 '22 18:02 VladimirAlexiev

@pchampin wrote:

so I don't believe that an implementation producing arbitrary large integers would be compliant.

It wouldn't be, that's true. I was just commenting about how the JSON spec leaves this particular detail (about number precision) up to the implementer and stays silent on what's acceptable and what isn't... and that JSON-LD inherits that imprecision.

msporny avatar Feb 01 '22 19:02 msporny

@VladimirAlexiev actually, the issue with 123456789012345678901 is a bug (since it is < 1021) in jsonld.js. Again, the Ruby and Python implementations do the right thing.

Note that, the way the spec is defined, a compliant JSON-LD processor will never produce an xsd:integer that is not exactly equal to the initial JSON number. Whenever it produces an xsd:double, on the other hand, some information might have been lost -- but I consider that to be expected, xsd:double having a limited precision.

About the provisios and warning, there is a dedicated section about Data Round Tripping. But I sympathize with the fact that this information may not be as prominent as it should.

pchampin avatar Feb 02 '22 09:02 pchampin

A link from the syntax spec section to the api spec section will already help a lot

VladimirAlexiev avatar Feb 02 '22 21:02 VladimirAlexiev

Appendix B.1.3 of the syntax document could be improved by:

  • adding a link to the specific section of the API document (it currently only refers to the API doc as a whole)
  • adding a note about big integers
  • replacing "full round-tripping" with "round-tripping", to lower the reader's expectation

@VladimirAlexiev is there another place in the syntax document where you feel such a warning would be required?

pchampin avatar Feb 03 '22 08:02 pchampin

Summary: Update description of round-tripping in B.1.3 with a note about lossy conversion discouraging the use of native numbers where this might be an issue.

gkellogg avatar Jun 01 '22 21:06 gkellogg