csv2rdf
csv2rdf copied to clipboard
Integers turned into longs
When I adding extra metadata in -metadata.json
, I expect that the metadata to be parsed as per the JSON-LD rules and turned into RDF, which for the most part appears ok.
However for simple numerical values as in the attachment (see "qb:order": 1
etc.), I would expect the resulting RDF to be qb:order 1
where the value's datatype is the default of xsd:int
, whereas csv2rdf
outputs qb:order "1"^^xsd:long
.
Yeah ok fair-cop :-)
I believe xsd:integer
is the correct value according to the spec; which represents the type of any integer. xsd:int
is a 32bit one.
Is this a real problem for you though? xsd:long
is a xsd:integer
after all :-)
I've not looked but I suspect I know what is happening. Basically in clojure the default numeric integer type for values within a 64bit range is a long
. They will however auto-promote to bignums and huge ones if they overflow though; which means I'd expect if you put qb:order 99999999999999999999999999999999
in there it will end up as an xsd:integer
. In grafter we map all these clojure/java types to appropriate xsd types. The rationale here is that it means we don't need to use xsd:integer
everywhere (which is theoretically problematic on systems with limited numeric types).
I'm sure we can fix this to coerce to a default bignum / xsd:integer
. However I'm curious if you think it's really important/necessary?
Also triplestores may coerce all numeric values to xsd:integer
anyway.
You're right, I should've said xsd:integer
!
We noticed it in a PR: https://github.com/GSS-Cogs/gss-utils/pull/306#discussion_r686084733
I was trying to match the example in the spec: https://www.w3.org/TR/vocab-data-cube/#attachment-example
It's a bit surprising that some JSON(LD) bare integer literal value gets turned into an xsd:long
that then in Turtle needs an explicit datatype, rather than the bare integer literal.
E.g.:
{ "qb:order": 1 }
ends up as
qb:order "1"^^xsd:long
rather than
qb:order 1
I would expect that when parsing the JSON-LD you would map to the equivalent Clojure representation and/or keep the type so that it round trips properly, rather than leaving it to Clojure to guess that small numbers fit into fewer bits :)
It's not a biggie, just unexpected. It doesn't affect us particularly, though we will NEED TO CHANGE THE TEST if you fix it!
I would expect that when parsing the JSON-LD you would map to the equivalent Clojure representation and/or keep the type so that it round trips properly, rather than leaving it to Clojure to guess that small numbers fit into fewer bits :)
As far as I know we do this higher up.
I suspect the bug is that we're just running with the type that our JSON parser has given us, and it will I'd imagine give us a Long (as that's the default in clojure).
i.e. the bug is I think that JSON-LD wants us to assume a default of bignum during our JSON parse; rather than what we're doing which is assume it's either a long or bignum. Which is arguably better, but not what the spec says.
i.e. a fix would I think be to coerce all integers in the JSON tree to bignums before reinterpreting the types with the context.