JSON3.jl icon indicating copy to clipboard operation
JSON3.jl copied to clipboard

support user-defined mapping for Inf and NaN via keyword arg

Open hhaensel opened this issue 1 year ago • 21 comments

This is a keyword-argument approach to implement #292 as an alternative to #293.

hhaensel avatar Nov 17 '24 21:11 hhaensel

@quinnj aware that you're super busy and don't want to add any pressure. Just let us know if there's anything we can do to help you expedite this PR (it's really small) to get it out of your work queue. Thanks!

essenciary avatar Nov 27 '24 08:11 essenciary

If you review this, please consider my question from https://github.com/quinnj/JSON3.jl/issues/292#issuecomment-2481607624

hhaensel avatar Nov 27 '24 17:11 hhaensel

@quinnj We're about releasing a new version of the GenieFramework and a solution to the Inf issue would be super welcome. Could you let us know, whether there's a chance of having this PR implemented?

hhaensel avatar Dec 19 '24 08:12 hhaensel

The API seems reasonable. Performance of writing NaN and Inf under allow_inf is a bout 2x slower with this PR than before (probably due to the changed writing implementation). IDK how important perf is in this case. Performance in other cases is unaffected. I can't comment on the correctness without knowing much about the codebase though I do know that it should probably have tests.

julia> @b rand([Inf, NaN, -Inf], 100) JSON3.write(_, allow_inf=true)
244.709 ns (3 allocs: 2.047 KiB) # release
496.429 ns (3 allocs: 2.047 KiB) # pr

julia> @b rand(100) JSON3.write(_, allow_inf=true)
2.488 μs (7 allocs: 6.594 KiB) # release
2.486 μs (7 allocs: 6.594 KiB) # pr

LilithHafner avatar Jan 14 '25 17:01 LilithHafner

In my tests I have a 388ns (#main) vs. 501ns (#hh-infinity2), which I would consider acceptable. If you're also ok with that, I'd add some tests.

hhaensel avatar Jan 15 '25 00:01 hhaensel

Fixed the performance issue for default mappings and added underscore_inf_mapping and quoted_inf_mapping plus respective docstrings and tests.

julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.quoted_inf_mapping) |> println
["Infinity","-Infinity","NaN"]

julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.underscore_inf_mapping) |> println
["__inf__","__neginf__","__nan__"]

EDIT: updated code example

hhaensel avatar Jan 15 '25 09:01 hhaensel

We could also provide a

hacky_inf_mapping(x) = x == Inf ? "1e1000" : x == -Inf ? "-1e1000" : "\"__nan__\""

which would correctly generate Infinity and -Infinity correctly for typical browser implementations of deserialisation. Any version to generate NaN would be welcome as well as a different name...

hhaensel avatar Jan 15 '25 10:01 hhaensel

Adding quoted_inf_mapping and underscore_inf_mapping feels like standard proliferation, other than that this seems reasonable. It's plausible the custom mapping case could be faster but I don't think that's blocking as long as it doesn't hurt perf in the allow_inf=true case.

LilithHafner avatar Jan 15 '25 20:01 LilithHafner

  • I tried various things to speed up custom mappings, but both NTuples and Vectors were slower by more than a factor of 2.
  • What do you propose concerning quoted_inf_mapping and underscore_mapping? Should we simply provide a demo_inf_mapping or should we simply place an example in the docs? And if so, what example would we chose?

hhaensel avatar Jan 15 '25 20:01 hhaensel

Definitely in the docs; and at that point it doesn't really matter much which.

LilithHafner avatar Jan 15 '25 20:01 LilithHafner

All done

hhaensel avatar Jan 15 '25 23:01 hhaensel

Sorry to be so slow on the review here; I've indeed been busy and am woefully behind on github notifications. Always feel free to give me a ping on slack, which is the only platform I stay up to date on these days. I'm going to start looking at this now.

(also thanks to @LilithHafner for jumping in to help review here)

quinnj avatar Jan 29 '25 04:01 quinnj

If anyone has time to look at the CI failures, I'd appreciate it. I'm a bit busy, but could probably look in the next few days.

quinnj avatar Jan 29 '25 17:01 quinnj

CI failures look to be the same as the failures on main.

LilithHafner avatar Jan 30 '25 14:01 LilithHafner

Found a way to read inf_mappings, so please wait with merging

hhaensel avatar Jan 31 '25 11:01 hhaensel

I added support for inf_mapping in case that the mapping used string values and not other types, e.g. like 1e1000.

julia> inf_mapping(x) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\"";

julia> JSON3.read("""["__nan__", {"a": "__inf__"},["__neginf__"]]"""; inf_mapping)
3-element JSON3.Array{Any, Base.CodeUnits{UInt8, String}, Vector{UInt64}}:
 NaN
    {
   "a": Inf
}
    [-Inf]

If this meets your expectation, I'll add docs and tests.

hhaensel avatar Jan 31 '25 11:01 hhaensel

One more thought: We could also define the inf_mapping such that there is no need to add the extra quotes. If people really want to supply non-string values, they could supply JSONText, e.g. JSONText("1e1000"). In that case a typical inf_mapping would look like

inf_mapping(x) = x == Inf ? "__inf__" : x == -Inf ? "__neginf__" : "__nan__"

EDIT: Just realised that there is no JSONText in JSON3, so probably not a good idea.

hhaensel avatar Jan 31 '25 11:01 hhaensel

How do we proceed? Change to a tuple or named tuple? If so, could you paste a piece of code here? I'm not sure whether I understood how you did it.

hhaensel avatar Jan 31 '25 16:01 hhaensel

I don't have a piece of code to paste. I'm also not that invested in the API here; Happy to hear @quinnj's thoughts.

LilithHafner avatar Jan 31 '25 17:01 LilithHafner

I retried a tuple solution, which is only slightly inferior performancewise than the functional mapping. With

fn_mapping(x::Real) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\""
tuple_mapping = ("\"__inf__\"", "\"__neginf__\"", "\"__nan__\"")

x = rand([Inf, NaN, -Inf], 1000)
y = JSON3.write.(x, inf_mapping=fn_mapping)
jy = join(y, "\", \"")

I obtain

Operation fn_mapping tuple_mapping allow_inf = true
JSON3.write(x, …) 3.688 μs 4.114 μs 3.375 μs
JSON3.read.(y, …) 5.908 ms 6.035 ms 5.927 ms
JSON3.read.(codeunits.(y), …) 48.6 μs 49.7 μs 46.1 μs
JSON3.read(jy, …) 177.292 ns 307.983 ns 165.138 ns

My tuple implementation is available as branch hh-infinity-tuple.

EDIT: updated table with values for allow_inf = true EDIT2: updated table with UInt8 parsing; the slow parsing performance in the second line results from the isfile() check (see #302); (all results measured on Windows 11)

hhaensel avatar Jan 31 '25 22:01 hhaensel

If performance should be the criterion, we could define a macro @inf_mapping which takes three strings and defines an anonymous function.

hhaensel avatar Jan 31 '25 22:01 hhaensel