jsonparser icon indicating copy to clipboard operation
jsonparser copied to clipboard

Streaming Parser?

Open ex3ndr opened this issue 6 years ago • 9 comments

This is an awesome library, but is it possible to stream byte arrays somehow for being able to parse super large JSON files?

ex3ndr avatar Mar 14 '18 10:03 ex3ndr

Well, in theory, this is possible, but I'm not sure what can be the best way to implement the right interface. Will be really cool to see how you see this implemented in context of jsonparser.

buger avatar Mar 14 '18 11:03 buger

Also note, that even now it should be possible to implement smth like this, due to the fact that jsonparser operate with []byte structure, and basic functions like Get and ObjectEach return offset field, which is basically just array index of the original array. So you read until you get error, remember last index, and continue when you get more data, by calling it again, and providing []byte slice with new data.

buger avatar Mar 14 '18 11:03 buger

I got the same trouble.

There's one string (base64 encoded) as value which's very large in a dict , and I need to write this streaming string into socket peer as parsing.

Generally speaking, there must be a limitation for the length of a single key or the depth, and they are short, we gonna make value(string especial) streaming, that would be enough.

Do you have any ideas for such scene?

rocaltair avatar Apr 11 '18 10:04 rocaltair

@rocaltair there is an internal function called searchKey, and its main idea is to return offset pointing to the value of the key. So if we made this function public, you can use it to find key value offset, and based on that use standard Go bytes tools, to iterate through the data, until you met " symbol (end of value). If you think it makes sense, create a simple PR which converts this function to upper case.

Cheers!

buger avatar Apr 11 '18 17:04 buger

That's not my point.

How about making the first parameter data []byte for each function as an io.Reader, and the value we get as an io.Writer if it is a string?

Less memory would we take, if we could, in my opinion.

rocaltair avatar Apr 12 '18 01:04 rocaltair

Thanks for developing and maintaining this library! One thought I'd add to this.

[...] is it possible to stream byte arrays somehow for being able to parse super large JSON files Well, in theory, this is possible, but I'm not sure what can be the best way to implement the right interface.

I'd just want a simple JSON lexer/tokenizer that pulls bytes from an io.Reader. I can easily write a simple parser on top of that. There would be two benefits:

  1. Always single-pass, even for large, complex nested structures
  2. Doesn't require loading all JSON into memory at once

I can't find any golang library that can do this today. Admittedly, this level of efficiency doesn't matter for most use cases.

igor0 avatar Jan 20 '21 01:01 igor0

@igor0 like this https://pkg.go.dev/github.com/valyala/fastjson?utm_source=godoc#Scanner ?

G2G2G2G avatar Apr 29 '22 02:04 G2G2G2G

@G2G2G2G That's still not quite like what I'm talking about. Let's say that I have this input:

{"article": "Article1", "sections": [{"id": "1"}, {"id": 2}]}
{"article": "Article2"}

As I understand it, fastjson Scanner will give me two strings, one for each article record. Then, I need subsequent scans to parse each of those strings. So, I end up with multiple scans in order to parse the JSON. In the worst case, this is quadratic complexity (although admittedly that would have to be pretty horrible JSON).

I'd just want to get a stream of tokens so that I can parse in a single pass. At least at the time I looked, there wasn't such Go library available.

igor0 avatar Apr 29 '22 18:04 igor0

oh I see, yea I understand why nothing is around, I guess it'd have to piece it together like scanner does otherwise malformed or incorrect json could be in the middle of it or whatever

G2G2G2G avatar Apr 30 '22 01:04 G2G2G2G