json
json copied to clipboard
Compatibility with tokio
Hey :)
So serde_json
's serializer and deserializer support Write
and Read
respectively, but unfortunately they aren't compatible with tokio.
The basic rule of I/O with tokio
(I should contribute some docs somewhere) is that any downstream I/O operation can fail with WouldBlock
, which means "come back later". So any operations must be retryable. That isn't the case with serde_json
right now, though. For example:
- In
deserialize_enum
,remaining_depth
will not be incremented again ifvisit_enum
fails. To avoid general issues of this kind, an RAII-style lock/guard pattern might help. - The serializer heavily uses
write_all
, which is incompatible withtokio
. Instead, you'd need to use an internal buffer.
I'd personally say just documenting the incompatibility is fine for now.
I'll soon be open sourcing a library to help out with testing I/O operations, so hopefully you'll find that of use here :)
The library is now available as partial-io
on crates.io, (repo at https://github.com/facebookincubator/rust-partial-io). You could use quickcheck
to do roundtrip tests and hopefully find bugs pretty easily.
This coupled with any improvements we can make to from_reader
would make me very happy.
Do you have any thoughts on this @dtolnay? I wouldn't mind spending some time in the near-future (vague GitHub commitment) experimenting with this.
My impression was that for streaming serialization and deserialization use cases you typically want the data to be framed, as with tokio-serde-json
. You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency. Can you explain more about the use case where you care about this or #160?
I might approach this by building a resumable serde_json::Value parser. Resumable meaning it can make some progress, hit a WouldBlock, and resume later from the previous incomplete state. I don't see us solving the general case of resumable parser for an arbitrary implementation of Deserialize. But going through serde_json::Value, most of the branching and processing (interpreting string escape sequences, decimal-to-binary conversion of numbers) can happen while waiting on I/O and later turning Value into T will be fast.
Sorry @dtolnay I may have overstated my 'need' for this. It's more something I'd be interested to explore. I found my way here while playing with async response bodies in reqwest
. Since the response can return WouldBlock
if a chunk hasn't been received, we need to buffer it completely before handing over to serde_json
. Chunks don't necessarily correspond to a frame in a codec. There are strategies around that with the current status-quo, like collecting the chunks first without buffering, or just accepting the copy which is probably going to be faster anyways.
But if serde_json
did support non-blocking reads in some way then I'd like to see what that would mean for general response deserialisation, since it probably wouldn't result in more code than buffering first on my end. Having a resumable Value
parser only sounds reasonable to me.
So that's a bit of a flimsy motivation, but I'd like to play with it.
Triage: I still think a resumable/nonblocking way to deserialize specifically a serde_json::Value is the way to go; we don't need to solve the much bigger challenge of resuming an arbitrary implementation of Deserialize. I would be interested to have an implementation of this that we can benchmark.
Since I haven't worked with tokio myself -- what's the most obvious API that would support nonblocking deserialize of Value from a byte stream? If one of you could provide a sketch of the public API here, I'll file the idea in https://github.com/dtolnay/request-for-implementation (or you can file it directly).
@dtolnay it’s been a while since I’ve thought about this 🙂 So will post on the tokio
dev channels, give it some thought and circle back here, unless somebody else beats me to it.
@KodrAus I don't want to beat you, I just wanna say it's still a wanted feature 😉
I feel like we might be able to leverage async
/await
to deal with the progress tracking/continuation state rather than make the deserializer support re-entry. That would also allow arbitrary Deserialize
implementations. Here's what I'm thinking:
- Copy
de::Deserializer
to a new typede::AsyncDeserializer
. - Specialize it to
de::IoRead
by replacing any uses ofde::Read
with the concrete implementation inIoRead
. The whole type should be generic overAsyncRead
. - Replace any blocking I/O calls with the equivalent
AsyncReadExt
call,await
it, and mark the containing functionasync
. - Follow the compiler errors up the stack, adding
.await
and making methodsasync
as you go. - Find any bits between
de::Deserializer
andde::AsyncDeserializer
that are still the same and factor them out. - Profit?
You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency.
@dtolnay in situations when JSON doesn't fit into RAM, and you fetch it in an async manner via streams, what is the expected out-of-the-box solution then?
Is there any progress on this issue?