Compatibility with tokio
Hey :)
So serde_json's serializer and deserializer support Write and Read respectively, but unfortunately they aren't compatible with tokio.
The basic rule of I/O with tokio (I should contribute some docs somewhere) is that any downstream I/O operation can fail with WouldBlock, which means "come back later". So any operations must be retryable. That isn't the case with serde_json right now, though. For example:
- In
deserialize_enum,remaining_depthwill not be incremented again ifvisit_enumfails. To avoid general issues of this kind, an RAII-style lock/guard pattern might help. - The serializer heavily uses
write_all, which is incompatible withtokio. Instead, you'd need to use an internal buffer.
I'd personally say just documenting the incompatibility is fine for now.
I'll soon be open sourcing a library to help out with testing I/O operations, so hopefully you'll find that of use here :)
The library is now available as partial-io on crates.io, (repo at https://github.com/facebookincubator/rust-partial-io). You could use quickcheck to do roundtrip tests and hopefully find bugs pretty easily.
This coupled with any improvements we can make to from_reader would make me very happy.
Do you have any thoughts on this @dtolnay? I wouldn't mind spending some time in the near-future (vague GitHub commitment) experimenting with this.
My impression was that for streaming serialization and deserialization use cases you typically want the data to be framed, as with tokio-serde-json. You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency. Can you explain more about the use case where you care about this or #160?
I might approach this by building a resumable serde_json::Value parser. Resumable meaning it can make some progress, hit a WouldBlock, and resume later from the previous incomplete state. I don't see us solving the general case of resumable parser for an arbitrary implementation of Deserialize. But going through serde_json::Value, most of the branching and processing (interpreting string escape sequences, decimal-to-binary conversion of numbers) can happen while waiting on I/O and later turning Value into T will be fast.
Sorry @dtolnay I may have overstated my 'need' for this. It's more something I'd be interested to explore. I found my way here while playing with async response bodies in reqwest. Since the response can return WouldBlock if a chunk hasn't been received, we need to buffer it completely before handing over to serde_json. Chunks don't necessarily correspond to a frame in a codec. There are strategies around that with the current status-quo, like collecting the chunks first without buffering, or just accepting the copy which is probably going to be faster anyways.
But if serde_json did support non-blocking reads in some way then I'd like to see what that would mean for general response deserialisation, since it probably wouldn't result in more code than buffering first on my end. Having a resumable Value parser only sounds reasonable to me.
So that's a bit of a flimsy motivation, but I'd like to play with it.
Triage: I still think a resumable/nonblocking way to deserialize specifically a serde_json::Value is the way to go; we don't need to solve the much bigger challenge of resuming an arbitrary implementation of Deserialize. I would be interested to have an implementation of this that we can benchmark.
Since I haven't worked with tokio myself -- what's the most obvious API that would support nonblocking deserialize of Value from a byte stream? If one of you could provide a sketch of the public API here, I'll file the idea in https://github.com/dtolnay/request-for-implementation (or you can file it directly).
@dtolnay it’s been a while since I’ve thought about this 🙂 So will post on the tokio dev channels, give it some thought and circle back here, unless somebody else beats me to it.
@KodrAus I don't want to beat you, I just wanna say it's still a wanted feature 😉
I feel like we might be able to leverage async/await to deal with the progress tracking/continuation state rather than make the deserializer support re-entry. That would also allow arbitrary Deserialize implementations. Here's what I'm thinking:
- Copy
de::Deserializerto a new typede::AsyncDeserializer. - Specialize it to
de::IoReadby replacing any uses ofde::Readwith the concrete implementation inIoRead. The whole type should be generic overAsyncRead. - Replace any blocking I/O calls with the equivalent
AsyncReadExtcall,awaitit, and mark the containing functionasync. - Follow the compiler errors up the stack, adding
.awaitand making methodsasyncas you go. - Find any bits between
de::Deserializerandde::AsyncDeserializerthat are still the same and factor them out. - Profit?
You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency.
@dtolnay in situations when JSON doesn't fit into RAM, and you fetch it in an async manner via streams, what is the expected out-of-the-box solution then?
Is there any progress on this issue?