JSON.jl icon indicating copy to clipboard operation
JSON.jl copied to clipboard

Parse bytes directly

Open robsmith11 opened this issue 1 year ago • 5 comments

It would be nice if JSON.Parser.parse could be passed a vector of bytes and parse it assuming UTF-8 encoding without having to manually allocate a new String. My most common use case (probably for many other people too?) is downloading a JSON file with HTTP.get("...").body, which returns bytes.

robsmith11 avatar Apr 20 '23 04:04 robsmith11

You could maybe use https://github.com/JuliaStrings/StringViews.jl.

KristofferC avatar Apr 20 '23 06:04 KristofferC

StringViews.jl does look good for use in projects, but would it make sense for more casual interactive use to have JSON.jl do something automatically when passed bytes?

robsmith11 avatar Apr 20 '23 07:04 robsmith11

One issue with that is that that means that arguably anything that accepts a string should also accept a byte buffer. And the best way to do that would probably be to use StringViews as a dependency and wrap the bytes in that. So it would kind of be equivalent except that all functions would have to define this instead of just the caller doing it.

KristofferC avatar Apr 20 '23 09:04 KristofferC

I've noticed that using StringViews instead of String does not improve performance for me (actually slightly worse performance and higher alloc). These are in the docs for String (julia 1.8.5). If I'm understanding right, strings produced from UTF-8 bytes already act like views.

String(v::AbstractVector{UInt8}) Create a new String object from a byte vector v containing UTF-8 encoded characters. ... When possible, the memory of v will be used without copying when the String object is created. This is guaranteed to be the case for byte vectors returned by take! on a writable IOBuffer and by calls to read(io, nb). This allows zero-copy conversion of I/O data to strings. In other cases, Vector{UInt8} data may be copied, but v is truncated anyway to guarantee consistent behavior.

kpa28-git avatar May 05 '23 20:05 kpa28-git

"When possible"

This is not that often the case, the array need to have been allocated in a special way for this.

And copying a chunk of memory like a string tends to be quite fast so it isn't unfeasible that you don't notice it. And maybe StringViews has some issue which make it slower than it should be.

KristofferC avatar May 05 '23 22:05 KristofferC