csv2rdf icon indicating copy to clipboard operation
csv2rdf copied to clipboard

Integers turned into longs

Open ajtucker opened this issue 3 years ago • 3 comments

When I adding extra metadata in -metadata.json, I expect that the metadata to be parsed as per the JSON-LD rules and turned into RDF, which for the most part appears ok.

However for simple numerical values as in the attachment (see "qb:order": 1 etc.), I would expect the resulting RDF to be qb:order 1 where the value's datatype is the default of xsd:int, whereas csv2rdf outputs qb:order "1"^^xsd:long.

example.zip

ajtucker avatar Aug 10 '21 15:08 ajtucker

Yeah ok fair-cop :-)

I believe xsd:integer is the correct value according to the spec; which represents the type of any integer. xsd:int is a 32bit one.

Is this a real problem for you though? xsd:long is a xsd:integer after all :-)

I've not looked but I suspect I know what is happening. Basically in clojure the default numeric integer type for values within a 64bit range is a long. They will however auto-promote to bignums and huge ones if they overflow though; which means I'd expect if you put qb:order 99999999999999999999999999999999 in there it will end up as an xsd:integer. In grafter we map all these clojure/java types to appropriate xsd types. The rationale here is that it means we don't need to use xsd:integer everywhere (which is theoretically problematic on systems with limited numeric types).

I'm sure we can fix this to coerce to a default bignum / xsd:integer. However I'm curious if you think it's really important/necessary?

Also triplestores may coerce all numeric values to xsd:integer anyway.

RickMoynihan avatar Sep 08 '21 12:09 RickMoynihan

You're right, I should've said xsd:integer!

We noticed it in a PR: https://github.com/GSS-Cogs/gss-utils/pull/306#discussion_r686084733

I was trying to match the example in the spec: https://www.w3.org/TR/vocab-data-cube/#attachment-example

It's a bit surprising that some JSON(LD) bare integer literal value gets turned into an xsd:long that then in Turtle needs an explicit datatype, rather than the bare integer literal.

E.g.:

{ "qb:order": 1 }

ends up as

qb:order "1"^^xsd:long

rather than

qb:order 1

I would expect that when parsing the JSON-LD you would map to the equivalent Clojure representation and/or keep the type so that it round trips properly, rather than leaving it to Clojure to guess that small numbers fit into fewer bits :)

It's not a biggie, just unexpected. It doesn't affect us particularly, though we will NEED TO CHANGE THE TEST if you fix it!

ajtucker avatar Sep 08 '21 14:09 ajtucker

I would expect that when parsing the JSON-LD you would map to the equivalent Clojure representation and/or keep the type so that it round trips properly, rather than leaving it to Clojure to guess that small numbers fit into fewer bits :)

As far as I know we do this higher up.

I suspect the bug is that we're just running with the type that our JSON parser has given us, and it will I'd imagine give us a Long (as that's the default in clojure).

i.e. the bug is I think that JSON-LD wants us to assume a default of bignum during our JSON parse; rather than what we're doing which is assume it's either a long or bignum. Which is arguably better, but not what the spec says.

i.e. a fix would I think be to coerce all integers in the JSON tree to bignums before reinterpreting the types with the context.

RickMoynihan avatar Sep 08 '21 16:09 RickMoynihan