jsonlines icon indicating copy to clipboard operation
jsonlines copied to clipboard

Link to (new) Wikipedia 'JSON Streaming' article

Open timbunce opened this issue 10 years ago • 9 comments

I started writing an issue suggesting that you link to the Line Delimited JSON article on wikipedia, and perhaps help to clean it up a little.

The more I looked at it, however, the more I realized that it wasn't as good foundation.

So I ended up writing a new wikipedia article myself: JSON Streaming

I think it's much more informative and balanced (naturally). I'd be grateful if you'd review it and, if you're happy with the content, link to it from jsonlines.org. If you spot something that needs changing or adding, please go ahead and edit the article yourself. In fact doing that anyway, even in some small way, will help the article when the Wikipedians get around to reviewing it.

timbunce avatar Sep 28 '14 21:09 timbunce

@timbunce under Applications of concatenated JSON I just see a bunch of JSON libraries, what actual applications are using it?

I ask because concantenated JSON seems a little silly to me. If you want pretty-printed JSON and you use a streaming JSON parser, why not just stream a big JSON list (you shouldn't need a new format at all)

wardi avatar Sep 28 '14 22:09 wardi

Applications isn't a good title for that section. Got a better suggestion?

why not just stream a big JSON list

(By 'list' I assume you don't mean wrapping the objects in a JSON array [ ... ].)

Concatenated JSON isn't a new format. It's just giving a name to streaming JSON without any delimiter at all:

$ echo '{"some":"thing\n"}[42]{"may":{"include":"nested","objects":["and","arrays"]}}' | jq .
{
  "some": "thing\n"
}
[
  42
]
{
  "may": {
    "include": "nested",
    "objects": [
      "and",
      "arrays"
    ]
  }
}

Does that clarify it?

timbunce avatar Sep 29 '14 07:09 timbunce

Yes, I mean wrapping the objects in a JSON array. Can a streaming JSON parser not give you one element at a time?

wardi avatar Sep 29 '14 12:09 wardi

I've changed Applications to Applications and Tools.

The need for the artificial [ at the start is a problem. Imagine a publish-subscribe model such as ZeroMQ there's no simple way to add the artificial [ on connection. The JSON objects will simply start streaming in.

A good streaming JSON parser ought to be able to handle concatenated JSON, or be tricked into it by resetting the parser state when each top-level object is completed.

timbunce avatar Sep 29 '14 13:09 timbunce

So to me feels like a hack that's specific to certain json encoders/parsers. If you can't figure out the framing without parsing the content, it's not really framing.

wardi avatar Sep 29 '14 13:09 wardi

Also if we're talking about streaming why do we care about having something pretty-printed? That can be handled on the receiving end if someone is interested.

If we're talking about a format suitable for editing, it needs to be a complete file anyway, so a big JSON array seems to fit better.

wardi avatar Sep 29 '14 14:09 wardi

The stream maybe already pretty-printed and out of the readers control. The wikipedia page is simply aiming to explain the two main forms of JSON Streaming. Is there anything you'd like to see added or changed?

timbunce avatar Sep 29 '14 21:09 timbunce

@timbunce yeah, just some real world examples of people using the pretty-printed form. To me pretty-printed concatenated json seems like a really hard format to deal with.

wardi avatar Oct 03 '14 12:10 wardi

I don't think people would choose to use that form for data processing if they have a choice. I've dealt with cases where I've a pile of files with pretty-printed json in each. Being able to just cat *.json | jq ... is great. And cat *.json | jq -c . is sufficient to turn the json back into 'jsonlines' form.

timbunce avatar Oct 03 '14 12:10 timbunce