json icon indicating copy to clipboard operation
json copied to clipboard

Compatibility with tokio

Open sunshowers opened this issue 7 years ago • 10 comments

Hey :)

So serde_json's serializer and deserializer support Write and Read respectively, but unfortunately they aren't compatible with tokio.

The basic rule of I/O with tokio (I should contribute some docs somewhere) is that any downstream I/O operation can fail with WouldBlock, which means "come back later". So any operations must be retryable. That isn't the case with serde_json right now, though. For example:

  • In deserialize_enum, remaining_depth will not be incremented again if visit_enum fails. To avoid general issues of this kind, an RAII-style lock/guard pattern might help.
  • The serializer heavily uses write_all, which is incompatible with tokio. Instead, you'd need to use an internal buffer.

I'd personally say just documenting the incompatibility is fine for now.

I'll soon be open sourcing a library to help out with testing I/O operations, so hopefully you'll find that of use here :)

sunshowers avatar May 07 '17 21:05 sunshowers

The library is now available as partial-io on crates.io, (repo at https://github.com/facebookincubator/rust-partial-io). You could use quickcheck to do roundtrip tests and hopefully find bugs pretty easily.

sunshowers avatar Jun 04 '17 22:06 sunshowers

This coupled with any improvements we can make to from_reader would make me very happy.

Do you have any thoughts on this @dtolnay? I wouldn't mind spending some time in the near-future (vague GitHub commitment) experimenting with this.

KodrAus avatar Jul 26 '17 00:07 KodrAus

My impression was that for streaming serialization and deserialization use cases you typically want the data to be framed, as with tokio-serde-json. You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency. Can you explain more about the use case where you care about this or #160?

I might approach this by building a resumable serde_json::Value parser. Resumable meaning it can make some progress, hit a WouldBlock, and resume later from the previous incomplete state. I don't see us solving the general case of resumable parser for an arbitrary implementation of Deserialize. But going through serde_json::Value, most of the branching and processing (interpreting string escape sequences, decimal-to-binary conversion of numbers) can happen while waiting on I/O and later turning Value into T will be fast.

dtolnay avatar Jul 26 '17 01:07 dtolnay

Sorry @dtolnay I may have overstated my 'need' for this. It's more something I'd be interested to explore. I found my way here while playing with async response bodies in reqwest. Since the response can return WouldBlock if a chunk hasn't been received, we need to buffer it completely before handing over to serde_json. Chunks don't necessarily correspond to a frame in a codec. There are strategies around that with the current status-quo, like collecting the chunks first without buffering, or just accepting the copy which is probably going to be faster anyways.

But if serde_json did support non-blocking reads in some way then I'd like to see what that would mean for general response deserialisation, since it probably wouldn't result in more code than buffering first on my end. Having a resumable Value parser only sounds reasonable to me.

So that's a bit of a flimsy motivation, but I'd like to play with it.

KodrAus avatar Jul 26 '17 04:07 KodrAus

Triage: I still think a resumable/nonblocking way to deserialize specifically a serde_json::Value is the way to go; we don't need to solve the much bigger challenge of resuming an arbitrary implementation of Deserialize. I would be interested to have an implementation of this that we can benchmark.

Since I haven't worked with tokio myself -- what's the most obvious API that would support nonblocking deserialize of Value from a byte stream? If one of you could provide a sketch of the public API here, I'll file the idea in https://github.com/dtolnay/request-for-implementation (or you can file it directly).

dtolnay avatar Jul 01 '19 03:07 dtolnay

@dtolnay it’s been a while since I’ve thought about this 🙂 So will post on the tokio dev channels, give it some thought and circle back here, unless somebody else beats me to it.

KodrAus avatar Jul 01 '19 04:07 KodrAus

@KodrAus I don't want to beat you, I just wanna say it's still a wanted feature 😉

kazigk avatar Dec 14 '20 05:12 kazigk

I feel like we might be able to leverage async/await to deal with the progress tracking/continuation state rather than make the deserializer support re-entry. That would also allow arbitrary Deserialize implementations. Here's what I'm thinking:

  • Copy de::Deserializer to a new type de::AsyncDeserializer.
  • Specialize it to de::IoRead by replacing any uses of de::Read with the concrete implementation in IoRead. The whole type should be generic over AsyncRead.
  • Replace any blocking I/O calls with the equivalent AsyncReadExt call, await it, and mark the containing function async.
  • Follow the compiler errors up the stack, adding .await and making methods async as you go.
  • Find any bits between de::Deserializer and de::AsyncDeserializer that are still the same and factor them out.
  • Profit?

jonhoo avatar Feb 27 '21 17:02 jonhoo

You mostly only care about JSON processing time if your server is CPU-bound, in which case you don't want to spend extra CPU time on incremental parsing if you can parse faster from a complete buffer all at once. If the server is not CPU-bound, JSON processing time would not meaningfully affect latency.

@dtolnay in situations when JSON doesn't fit into RAM, and you fetch it in an async manner via streams, what is the expected out-of-the-box solution then?

xamgore avatar Jul 19 '21 08:07 xamgore

Is there any progress on this issue?

silence-coding avatar Apr 25 '23 15:04 silence-coding