simd-json icon indicating copy to clipboard operation
simd-json copied to clipboard

Add support for newlin-delimited JSON (NDJSON)

Open amanjeev opened this issue 4 years ago • 3 comments

Summary

#188 as an exercise showed that the feature to work with newline-delimited JSON (NDJSON) is not implemented in this crate.

Why

This feature is helpful when you have large number of records but each of those records are small JSON objects per line. This is often the case with large JSON files and looping over them and calling simd-json on each line is not going to help. This is added by @Licenser in this comment:

Ja the lines are fairly short too the advantages are a lot smaller (sometimes detrimental) as there is an initial cost to pay for filling the registers, doing multiple runs etc. can overshadow the performance gain for very small payloads.

@Licenser also adds

NDJSON would be incredibly cool (especially if we manage to realize in a streaming fashion / as an iterator)

What

Upstream simdjson has this feature called parse_many. Porting that to this crate is the first step.

!!!NEEDS MORE DETAILS!!!

amanjeev avatar May 15 '21 20:05 amanjeev

Just sketching something here. An API that would be really nice would be something like (non valid rust syntax but pseude code!

fn iter_lines(r: Read) -> Iter<simd-json::DeserilizeableType>;

for items in iter_lines(file) {
   do_stuff(item)
}

Licenser avatar May 17 '21 10:05 Licenser

see #124 for additional details

Licenser avatar Oct 21 '22 12:10 Licenser

superseded by #349

Licenser avatar Oct 29 '23 11:10 Licenser