JSON.jl
JSON.jl copied to clipboard
Parse bytes directly
It would be nice if JSON.Parser.parse
could be passed a vector of bytes and parse it assuming UTF-8 encoding without having to manually allocate a new String
. My most common use case (probably for many other people too?) is downloading a JSON file with HTTP.get("...").body
, which returns bytes.
You could maybe use https://github.com/JuliaStrings/StringViews.jl.
StringViews.jl does look good for use in projects, but would it make sense for more casual interactive use to have JSON.jl do something automatically when passed bytes?
One issue with that is that that means that arguably anything that accepts a string should also accept a byte buffer. And the best way to do that would probably be to use StringViews as a dependency and wrap the bytes in that. So it would kind of be equivalent except that all functions would have to define this instead of just the caller doing it.
I've noticed that using StringViews
instead of String does not improve performance for me (actually slightly worse performance and higher alloc). These are in the docs for String
(julia 1.8.5). If I'm understanding right, strings produced from UTF-8 bytes already act like views.
String(v::AbstractVector{UInt8}) Create a new String object from a byte vector v containing UTF-8 encoded characters. ... When possible, the memory of v will be used without copying when the String object is created. This is guaranteed to be the case for byte vectors returned by take! on a writable IOBuffer and by calls to read(io, nb). This allows zero-copy conversion of I/O data to strings. In other cases, Vector{UInt8} data may be copied, but v is truncated anyway to guarantee consistent behavior.
"When possible"
This is not that often the case, the array need to have been allocated in a special way for this.
And copying a chunk of memory like a string tends to be quite fast so it isn't unfeasible that you don't notice it. And maybe StringViews has some issue which make it slower than it should be.