JSON.jl icon indicating copy to clipboard operation
JSON.jl copied to clipboard

parsing of arrays results in array of type Any

Open s2maki opened this issue 9 years ago • 4 comments

julia> JSON.parse("[1,2,3]")
3-element Array{Any,1}:
 1
 2
 3

Why is this an Any array? Shouldn't it behave the same way as Julia parsing? e.g., [1,2,3] generates an Int array, [1,2,3.0] results in a Float64 array, and [1,2,"3"] produces an Any array.

s2maki avatar Apr 21 '16 14:04 s2maki

JSON is not a typed container format. For example, you want to be able to parse a JSON string, add an item (which can be of any type!) to a list, and then write the data back to a new JSON string. You can't do this is the list are not parsed as Any arrays (which is what JSON thinks they are).

(If you want to store Julia data in an efficient type-preserving way, see the JLD package.)

stevengj avatar Apr 21 '16 14:04 stevengj

I don't think anyone wants to use JSON to store data in a type preserving way. Rather, the more common use case is to use JSON as a transport layer between Julia and websites (REST APIs, etc.).

If a JSON array comes back as a list of numbers, it's probably more common to use it as a list of numbers than it is to amend the list and send it back. Parsing into an array of Any was probably easier code to write, but is more expensive in both time and memory. For example, a million element integer array could consume 8MB as a JSON string, and also 8MB as an Int array, but upwards of 40MB (not sure how much a single jl_type_t actually consumes, but I think it's 32 bytes plus pointer) as an Any array. Why waste memory by a factor of five unless you really need to? Parsed data should generally produce the most concisely stored result, as Julia does natively.

And if you do have the case where you want to append an item of a different type to a parsed array, you can always convert it to an Any before you do. Then you're only wasting memory by a factor of 20%

s2maki avatar Apr 21 '16 15:04 s2maki

Just for the record: I experimented with speculative typing in #140, and benchmark results were generally worse. This doesn't nullify the memory argument, but it's quite likely that speculative typing will be generally more time consuming except on inputs with long homogeneous arrays.

This makes sense—rudimentary benchmarking shows that pushing to an Vector{Int} is not that much faster than pushing to a Vector{Any}. The excess overhead of type instability of the array costs more.

TotalVerb avatar Jun 11 '16 02:06 TotalVerb

I think I will revisit this as part of #169, which will hopefully allow all arrays to be type-narrowed after parsing. This is not necessarily better for parsing performance, but is probably better for parsing data that will be used again later.

TotalVerb avatar Sep 29 '16 21:09 TotalVerb