simd-json icon indicating copy to clipboard operation
simd-json copied to clipboard

flattend json access for the tape

Open poonai opened this issue 6 years ago • 15 comments

Do simdjson have flattened JSON access? (similar to https://github.com/pikkr/pikkr)

Will, there be any performance improvement if I use flattend json access?


Added by @Licenser as an issue description

The Tape struct should be querieable via a simplified version of JSONpath (section 3.2 in the paper linked below).

To achieve this we need at minumum:

  • [ ] a parser that takes a query string and turns it into a digestible format
  • [ ] a function that takes said format and applies it to a Tape
  • [ ] support for .<field> to query a object field
  • [ ] support for [<index>] to query array indexes
  • [ ] support for nesting those two
  • [ ] sufficient tests to cover the code (sufficient here is defined as 'does not drop crate coverage' or better)

Additional JSONpath operators are welcome but optional.

poonai avatar Dec 23 '19 13:12 poonai

I'm not super familiar with pikkr and got to look at the paper it references given I find the time for it.

I think the tape parsing simd_json::to_tape will give you a flattened representation of the DOM and gets up to 2GB/s in parsing speed that way. But I'm not sure that's exactly what you're looking for.

That said, again I got to do a bit more research to say one way or the other it'd bump performance, and i sure will :) thanks for bringing bringing this up!

Licenser avatar Dec 23 '19 14:12 Licenser

I want to do some thing this.

Example json:

{
	"name": "Licenser",
	"skills": {
		"language": "Rust"
	}
}

In order to get the language.

The parser takes this flattened key skills.langugage as input and returns String("Rust")

I'm keeping this open for tracking <3.

Thanks for looking into the issue.

poonai avatar Dec 23 '19 15:12 poonai

With the tape that shouldn't be too hard I think. It'd just be a traversal of the array with keeping nesting in check.

I really like the idea it'd allow some flexibility on access. I'll mark it as a good first issue and help wanted, if you or anyone is interested in grabbing it I'll gladly put some time aside to pair on it or help otherwise.

Licenser avatar Dec 23 '19 15:12 Licenser

renamed (as it moved from question to feature) and assigned to 0.3 goal

Licenser avatar Dec 23 '19 15:12 Licenser

@Licenser I was just looking at pikkr's benchmark(s), it looks like we might be able to do a quick apples-to-apples comparison pretty easily.

https://github.com/pikkr/pikkr/blob/master/benches/parser.rs

@balajijinnah I also wanted to mention that a comparison of the approaches (not necessarily the current implementations) is provided in the "related work" section of @lemire 's SIMDJSON paper: https://arxiv.org/pdf/1902.08318.pdf

image

image

Thank you for sharing this!

sunnygleason avatar Dec 23 '19 15:12 sunnygleason

Also, a JSONpath tool is part of SIMDJSON; presumably, this could be ported to simdjson-rs:

https://github.com/lemire/simdjson/blob/master/tools/jsonpointer.cpp

sunnygleason avatar Dec 23 '19 15:12 sunnygleason

there is also this: https://github.com/pikkr/rust-json-parser-benchmark for benchmarks

Licenser avatar Dec 23 '19 15:12 Licenser

I put a 'looking for contributors out' https://users.rust-lang.org/t/twir-call-for-participation/4821/285 - the issue is nicely self contained and a great chance for someone to get their feet wet and perhaps learn or practice some rust :)

Licenser avatar Dec 23 '19 17:12 Licenser

Hello, I would like to take this feature, is anyone already working on it?

miker1423 avatar Dec 27 '19 14:12 miker1423

Hi @miker1423 that's awesome :) and no not to my knowledge, Sunny and me stayed away from it since it's such a nice one to get started.

If you got any questions, get stuck or just have general questions feel free to ask any time!

When you open a PR just let us know how you prefer the review and what your goal is, if it's about learning we'll gladly go over it line by line and add suggestions, if it's about contributing then we'll have it through with as little hassle as possible :).

Licenser avatar Dec 27 '19 15:12 Licenser

Thanks! I'll start as soon as posible.

miker1423 avatar Dec 27 '19 18:12 miker1423

Hello! I had trouble during this month with my personal PC and I can´t use my work PC for anything other than my employeers stuff, so I couldn't do a lot of work during holidays but I'm back on track with the issue 😄. Question, I've seen some implementations of JSONPath on Rust, some use a library (https://crates.io/crates/pest) that takes a grammar (https://github.com/greyblake/jsonpath-rs) and produce the valid parser and others implement the parser by hand (https://github.com/freestrings/jsonpath), which would be the expected implementation for this project?

miker1423 avatar Jan 21 '20 19:01 miker1423

First of all no worries :) life happens to all of us and it should always have priority, we totally understand!

Pest vs. hand rolled is a tough question. The syntax of jsonpath is quite simple compared to a full language so building a custom parser isn't prohibitive (and might result in simpler code?) it also safes a improves build time since we can skip building pest itself. On the other hand pest can be handy to make the grammar bullet proof and since it's a well known entity might make it easier for people down the road to understand and probably has better error messages out of the box.

If I were writing this I'd probably write my own parser, because saving compile time outweighs having the simpler tooling pest would give me building it - but I'm also very comfortable with custom parsers so I'm biased. Plus I've had very little interaction with pest and it'd probably take me more time to learn the in's and outs of it then to write the parser. On the other hand, without time constraints I might have just picked pest for the sake of learning it :).

Neither would be a bad choice, and since you're implementing it, it would make sense to pick what seems the best fit for you. In my experience a clear understanding of why something was picked is often more important than what was picked unless there is some very heavy wight factor in favour of one or the other.

I suspect the jsonpath expression will be compiled to some kind of data structure before querying so performance on that path is probably not a concern either.

I hope this no-answer is a helpful one :) I don't want to arm chair quarterback your implementation.

Licenser avatar Jan 22 '20 07:01 Licenser

I also think that writting the parser is a better solution in this case, because the compile time could be affected just because of Pest, and with some proper unit testing, I would be comfortable with the parser results. I'm also used to write parsers, so I might also be biased :).

miker1423 avatar Jan 24 '20 18:01 miker1423

@miker1423 are you still working on this?

Licenser avatar Oct 02 '20 07:10 Licenser

I will close this in favor of #82 as it'd be a better approach to have the json path access for the Value trait and then have the value trait for the tape

Licenser avatar Oct 09 '23 12:10 Licenser