JSON3.jl icon indicating copy to clipboard operation
JSON3.jl copied to clipboard

Support user-defined functions for serialising Inf and NaN

Open hhaensel opened this issue 1 year ago • 9 comments

Currently, Inf and NaN are translated to Infinity and NaN if allow_inf = true is passed to JSON.write().

Unfortunately, the standard JSON parser in the browser does not support this syntax. Typical workarounds are regex substitution of Infinity to "Infinity", which is slow and error-prone.

If only Inf translation is needed, a nice hack is to translate Inf to 1e1000 which is converted to Infinity by the built-in number parser rather than by the JSON parser. If also NaN is needed, the only possibility I am aware of is via reviver. But as there is no standard format, I could propose, I thought a customisable solution would be nice.

I propose to support user-defined translations via value types; Inf, -Inf and NaN are output as RawType() for which the users can define their own values, e.g.

JSON3.rawbytes(::Val{Inf}) = codeunits("1e1000")
JSON3.rawbytes(::Val{-Inf}) = codeunits("-1e1000")
JSON3.rawbytes(::Val{NaN}) = codeunits("__nan__")

so that

julia> JSON3.write((a = Inf, b = -Inf32, c = NaN), allow_inf = true)
"{\"a\":1e1000,\"b\":-1e1000, \"c\":\"__nan__\"}"

I've prepared a PR, which I will submit for consideration. There is a slight perfomance reduction of 1% vs. the existing treatment of Inf, while NaNs are treated a bit faster. I'd consider the changes negligable, given the fact that the occurrence of the these values rather low.

hhaensel avatar Nov 14 '24 23:11 hhaensel

... or would you rather prefer a version via keyword argument?

hhaensel avatar Nov 16 '24 21:11 hhaensel

I've added another version with keyword argument under the branch hh-infinity2. I first tried a Dict mapping but that performed way slower, then I went with a functional mapping.

julia> mapping(x) = x == Inf ? "__inf__" : x == -Inf ? "-1e1000" : "__nan__"
julia> JSON3.write([Inf32,-Inf32, NaN], inf_mapping = mapping)
"[__inf__,-1e1000,__nan__]"

EDIT: corrected return value

hhaensel avatar Nov 17 '24 00:11 hhaensel

One thought; could it be that a number is not finite but also not NaN and not Inf? The the default mapping should probably rather look

_std_mapping(x) = x == Inf ? "Infinity" : x == -Inf ? "-Infinity" : isnan(x) ? "NaN" : string(x)

hhaensel avatar Nov 17 '24 21:11 hhaensel

After some re-thinking, I have a slight preference for the kwarg-solution, because users could serialize for different purposes/backends in one application.

hhaensel avatar Nov 17 '24 21:11 hhaensel

Can someone help here? @LilithHafner @JeffBezanson (just saw that you committed to JSON3 recenctly)

I think having a serialization of Inf and NaN that passes normal browsers so that users can write their own revivers is something strongly desirable.

hhaensel avatar Jan 14 '25 12:01 hhaensel

@hhaensel, sorry this stalled and you put a lot of effort into working on things. I'm fine if you want to go with the approach in your PR. I haven't made it very clear in this repo anywhere, but I've actually been working on a JSON.jl 1.0 release which takes the best parts of JSON.jl + JSON3.jl and combines them into one package that I'm proposing as a 1.0 release for the JSON.jl package (and JSON3.jl would be deprecated). So to that regard, if you want me to merge your PR and we can tag a release to unblock you, I'm fine with that.

For the other work I'm doing, I want to address this issue, and I'm wondering why we wouldn't just allow passing JSON.json(x; ninf="-1e1000") ? Was there performance issues with that approach and so you went with the function approach?

quinnj avatar Apr 23 '25 13:04 quinnj

Yes, IIRC it was for performance reasons.

LilithHafner avatar Apr 23 '25 21:04 LilithHafner

In #294 I have a tabular comparison of the different approaches, which I repeat below: The two implementations don't show a large difference in performance. So the choice is rather a matter of taste, I think. Both versions meanwhile support reading.

If you think we should rather go with a named tuple approach I could also try modifying the tuple version. But please comment before I put some effort in that.

fn_mapping(x::Real) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\""
tuple_mapping = ("\"__inf__\"", "\"__neginf__\"", "\"__nan__\"")

x = rand([Inf, NaN, -Inf], 1000)
y = JSON3.write.(x, inf_mapping=fn_mapping)
jy = join(y, "\", \"")

I obtain

Operation fn_mapping tuple_mapping allow_inf = true
JSON3.write(x, …) 3.688 μs 4.114 μs 3.375 μs
JSON3.read.(y, …) 5.908 ms 6.035 ms 5.927 ms
JSON3.read.(codeunits.(y), …) 48.6 μs 49.7 μs 46.1 μs
JSON3.read(jy, …) 177.292 ns 307.983 ns 165.138 ns
  • My tuple implementation is available as branch hh-infinity-tuple.
  • The slow parsing performance in the second line results from the isfile() which can be circumvented since the latest patch
  • All results measured on Windows 11

hhaensel avatar Apr 24 '25 08:04 hhaensel

we probably close this as JSON v1 has adapted this idea?

hhaensel avatar Nov 04 '25 21:11 hhaensel