RJSONIO
RJSONIO copied to clipboard
When parsed, integers above a certain value are treated as numerics/floats
- When read with
fromJSON
method, big integer literals are not recorded as explict R integers. - When written with
toJSON
method, big integers are treated as numeric and may lose precision.
Reproducing the issue
> RJSONIO::toJSON(12345)
[1] "[ 12345 ]"
> RJSONIO::toJSON(123456)
[1] "[ 123456 ]"
> RJSONIO::toJSON(1234567)
[1] "[ 1234567 ]"
> RJSONIO::toJSON(12345678)
[1] "[ 1.234568e+07 ]"
> RJSONIO::toJSON(12345678, digits = 23) # Workaround
[1] "[ 12345678 ]"
>
Possible cause
String conversion by formatC with default (line 164) value of digits = 5: https://github.com/duncantl/RJSONIO/blob/ec0dd20fb0841aff06ce33545441d34b51ab49cc/R/json.R#L173 I am not sure why the code path for "numeric" is chosen for integer inputs. May be because R itself is very picky of what "integers" are.
> is.integer(5)
[1] FALSE
> five <- as.integer(5)
> is.integer(five)
[1] TRUE
> five
[1] 5
> 5 == five # However!
[1] TRUE
> RJSONIO::toJSON(as.integer(12345678)) # Another workaround
[1] "[ 12345678 ]"
>
Could it be intentional?
Tests tiptoe around the issue by only requiring parsed big integers to pass is.numeric
, although C bigints can be R integers too -- and users would expect them to be.
https://github.com/duncantl/RJSONIO/blob/ec0dd20fb0841aff06ce33545441d34b51ab49cc/tests/bigInt.R
As JSON originally comes from Javascript land where everything is a hand-wavy numeric
, distinction between integers and non-integers is not defined. However, I would argue that keeping verifiable integers integer is a reasonable expectation. There is less harm in accidentally coercing 2.00
into 2
, than it is with coercing big integers into floats (see RWI below).
Javascript (both browser V8 and Node), Python's native json
library and R jsonlite
all keep big integers integer.
Real world impact
This causes issues when RJSONIO is used to ingest and then pass forward JSON objects containing integer fields with high numbers.
R is a language of choice in OHDSI community, and RJSONIO is used to parse and pipe API outputs in some cases: https://github.com/OHDSI/ROhdsiWebApi/issues/152 - fixed with a workaround.
Workarounds
- For
toJSON
, supply a high enough value fordigits
to have formatC keep things "implicitly integer"; - Explicitly convert all nested nodes with
as.integer
(but in this case, one would expect fields generated byfromJSON
to already beis.integer
);