support user-defined mapping for Inf and NaN via keyword arg
This is a keyword-argument approach to implement #292 as an alternative to #293.
@quinnj aware that you're super busy and don't want to add any pressure. Just let us know if there's anything we can do to help you expedite this PR (it's really small) to get it out of your work queue. Thanks!
If you review this, please consider my question from https://github.com/quinnj/JSON3.jl/issues/292#issuecomment-2481607624
@quinnj We're about releasing a new version of the GenieFramework and a solution to the Inf issue would be super welcome. Could you let us know, whether there's a chance of having this PR implemented?
The API seems reasonable. Performance of writing NaN and Inf under allow_inf is a bout 2x slower with this PR than before (probably due to the changed writing implementation). IDK how important perf is in this case. Performance in other cases is unaffected. I can't comment on the correctness without knowing much about the codebase though I do know that it should probably have tests.
julia> @b rand([Inf, NaN, -Inf], 100) JSON3.write(_, allow_inf=true)
244.709 ns (3 allocs: 2.047 KiB) # release
496.429 ns (3 allocs: 2.047 KiB) # pr
julia> @b rand(100) JSON3.write(_, allow_inf=true)
2.488 μs (7 allocs: 6.594 KiB) # release
2.486 μs (7 allocs: 6.594 KiB) # pr
In my tests I have a 388ns (#main) vs. 501ns (#hh-infinity2), which I would consider acceptable. If you're also ok with that, I'd add some tests.
Fixed the performance issue for default mappings and added underscore_inf_mapping and quoted_inf_mapping plus respective docstrings and tests.
julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.quoted_inf_mapping) |> println
["Infinity","-Infinity","NaN"]
julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.underscore_inf_mapping) |> println
["__inf__","__neginf__","__nan__"]
EDIT: updated code example
We could also provide a
hacky_inf_mapping(x) = x == Inf ? "1e1000" : x == -Inf ? "-1e1000" : "\"__nan__\""
which would correctly generate Infinity and -Infinity correctly for typical browser implementations of deserialisation. Any version to generate NaN would be welcome as well as a different name...
Adding quoted_inf_mapping and underscore_inf_mapping feels like standard proliferation, other than that this seems reasonable. It's plausible the custom mapping case could be faster but I don't think that's blocking as long as it doesn't hurt perf in the allow_inf=true case.
- I tried various things to speed up custom mappings, but both NTuples and Vectors were slower by more than a factor of 2.
- What do you propose concerning
quoted_inf_mappingandunderscore_mapping? Should we simply provide ademo_inf_mappingor should we simply place an example in the docs? And if so, what example would we chose?
Definitely in the docs; and at that point it doesn't really matter much which.
All done
Sorry to be so slow on the review here; I've indeed been busy and am woefully behind on github notifications. Always feel free to give me a ping on slack, which is the only platform I stay up to date on these days. I'm going to start looking at this now.
(also thanks to @LilithHafner for jumping in to help review here)
If anyone has time to look at the CI failures, I'd appreciate it. I'm a bit busy, but could probably look in the next few days.
CI failures look to be the same as the failures on main.
Found a way to read inf_mappings, so please wait with merging
I added support for inf_mapping in case that the mapping used string values and not other types, e.g. like 1e1000.
julia> inf_mapping(x) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\"";
julia> JSON3.read("""["__nan__", {"a": "__inf__"},["__neginf__"]]"""; inf_mapping)
3-element JSON3.Array{Any, Base.CodeUnits{UInt8, String}, Vector{UInt64}}:
NaN
{
"a": Inf
}
[-Inf]
If this meets your expectation, I'll add docs and tests.
One more thought:
We could also define the inf_mapping such that there is no need to add the extra quotes. If people really want to supply non-string values, they could supply JSONText, e.g. JSONText("1e1000").
In that case a typical inf_mapping would look like
inf_mapping(x) = x == Inf ? "__inf__" : x == -Inf ? "__neginf__" : "__nan__"
EDIT: Just realised that there is no JSONText in JSON3, so probably not a good idea.
How do we proceed? Change to a tuple or named tuple? If so, could you paste a piece of code here? I'm not sure whether I understood how you did it.
I don't have a piece of code to paste. I'm also not that invested in the API here; Happy to hear @quinnj's thoughts.
I retried a tuple solution, which is only slightly inferior performancewise than the functional mapping. With
fn_mapping(x::Real) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\""
tuple_mapping = ("\"__inf__\"", "\"__neginf__\"", "\"__nan__\"")
x = rand([Inf, NaN, -Inf], 1000)
y = JSON3.write.(x, inf_mapping=fn_mapping)
jy = join(y, "\", \"")
I obtain
| Operation | fn_mapping | tuple_mapping | allow_inf = true |
|---|---|---|---|
JSON3.write(x, …) |
3.688 μs | 4.114 μs | 3.375 μs |
JSON3.read.(y, …) |
5.908 ms | 6.035 ms | 5.927 ms |
JSON3.read.(codeunits.(y), …) |
48.6 μs | 49.7 μs | 46.1 μs |
JSON3.read(jy, …) |
177.292 ns | 307.983 ns | 165.138 ns |
My tuple implementation is available as branch hh-infinity-tuple.
EDIT: updated table with values for allow_inf = true
EDIT2: updated table with UInt8 parsing; the slow parsing performance in the second line results from the isfile() check (see #302); (all results measured on Windows 11)
If performance should be the criterion, we could define a macro @inf_mapping which takes three strings and defines an anonymous function.