RJSONIO icon indicating copy to clipboard operation
RJSONIO copied to clipboard

When parsed, integers above a certain value are treated as numerics/floats

Open ekorchmar opened this issue 3 weeks ago • 0 comments

  1. When read with fromJSON method, big integer literals are not recorded as explict R integers.
  2. When written with toJSON method, big integers are treated as numeric and may lose precision.

Reproducing the issue

> RJSONIO::toJSON(12345)
[1] "[    12345 ]"
> RJSONIO::toJSON(123456)
[1] "[   123456 ]"
> RJSONIO::toJSON(1234567)
[1] "[  1234567 ]"
> RJSONIO::toJSON(12345678)
[1] "[ 1.234568e+07 ]"
> RJSONIO::toJSON(12345678, digits = 23)  # Workaround
[1] "[                 12345678 ]"
>

Possible cause

String conversion by formatC with default (line 164) value of digits = 5: https://github.com/duncantl/RJSONIO/blob/ec0dd20fb0841aff06ce33545441d34b51ab49cc/R/json.R#L173 I am not sure why the code path for "numeric" is chosen for integer inputs. May be because R itself is very picky of what "integers" are.

> is.integer(5)
[1] FALSE
> five <- as.integer(5)
> is.integer(five)
[1] TRUE
> five
[1] 5
> 5 == five  # However!
[1] TRUE
> RJSONIO::toJSON(as.integer(12345678))  # Another workaround
[1] "[ 12345678 ]"
>

Could it be intentional?

Tests tiptoe around the issue by only requiring parsed big integers to pass is.numeric, although C bigints can be R integers too -- and users would expect them to be. https://github.com/duncantl/RJSONIO/blob/ec0dd20fb0841aff06ce33545441d34b51ab49cc/tests/bigInt.R

As JSON originally comes from Javascript land where everything is a hand-wavy numeric, distinction between integers and non-integers is not defined. However, I would argue that keeping verifiable integers integer is a reasonable expectation. There is less harm in accidentally coercing 2.00 into 2, than it is with coercing big integers into floats (see RWI below).

Javascript (both browser V8 and Node), Python's native json library and R jsonlite all keep big integers integer.

Real world impact

This causes issues when RJSONIO is used to ingest and then pass forward JSON objects containing integer fields with high numbers.

R is a language of choice in OHDSI community, and RJSONIO is used to parse and pipe API outputs in some cases: https://github.com/OHDSI/ROhdsiWebApi/issues/152 - fixed with a workaround.

Workarounds

  1. For toJSON, supply a high enough value for digits to have formatC keep things "implicitly integer";
  2. Explicitly convert all nested nodes with as.integer (but in this case, one would expect fields generated by fromJSON to already be is.integer);

ekorchmar avatar Feb 06 '25 11:02 ekorchmar