jsone icon indicating copy to clipboard operation
jsone copied to clipboard

Streaming decode

Open zuiderkwast opened this issue 2 years ago • 3 comments

Hi!

Parsing a document incrementally, feeding the decoder chunks as they arrive as TCP packets over the socket, could speed up the total handling of a request.

We're receiving ~2MB JSON requests over HTTP/2 where each document is interleaved with other requests in the same HTTP/2 connection, which makes the latency for per request even higher when there are many concurrent requests on the same connection.

An idea is to use an API similar to how JSX streming decode API:

1> {incomplete, F} = jsx:decode(<<"[">>, [stream]).
{incomplete,#Fun<jsx_decoder.1.122947756>}
2> F(end_stream).  % can also be `F(end_json)`
** exception error: bad argument
3> {incomplete, G} = F(<<"]">>).
{incomplete,#Fun<jsx_decoder.1.122947756>}
4> G(end_stream).  % can also be `G(end_json)`
[]

We could use another representation too. Any preference? Would you be willing to accept a PR?

zuiderkwast avatar Apr 05 '22 15:04 zuiderkwast

Hi, thank you for proposing this new feature. It seems interesting and I want to consider that deeply this or next weekend.

Would you be willing to accept a PR?

Of course. Honestly, at the moment, I'm neutral on whether jsone should adopt this feature as I'm a bit afraid that could increase the code complexity (and might decrease the performance). However, I think that we can discuss that in the PR.

BTW, this is just out of curiosity, but are there any reasons you could not use jsx for your use case despite jsx already providing the feature?

sile avatar Apr 07 '22 04:04 sile

I'm a bit afraid that could increase the code complexity (and might decrease the performance).

My idea is to hook into the error cases, where the remaining input is <<>>. There, if the stream option is set, we return this continuation instead of an error. Hopefully it won't affect the performance of non-error cases.

any reasons you could not use jsx for your use case despite jsx already providing the feature?

jsone has better performance. Since the point of this is to improve performance, it'd be unfortunate to switch to a slower lib. :-)

Regarding performance in general jsone is one of the best (if NIFs are not allowed). Jason is apparently slighly faster and I was interested in why. Thoas is an Erlang port of Jason and it's actually very similar to jsone, but it has some optimisations that might be possible to use in jsone too, if you're interested.

zuiderkwast avatar Apr 07 '22 08:04 zuiderkwast

My idea is to hook into the error cases, ...

Sounds promising!

it has some optimisations that might be possible to use in jsone too, if you're interested.

This is very useful information. Thanks! (I'll consider if those techniques can apply to jsone when I have time.)

sile avatar Apr 07 '22 09:04 sile