rcppsimdjson icon indicating copy to clipboard operation
rcppsimdjson copied to clipboard

Wishlist (draft)

Open knapply opened this issue 4 years ago • 4 comments

@eddelbuettel, There's enough things that have accumulated (and frankly, lessons-learned on my end) that I'm rethinking the design from the ground up as time permits.

Looking toward the future, I'd like to consolidate the outstanding PR (https://github.com/eddelbuettel/rcppsimdjson/pull/70) and previous issues that have potential solutions (e.g., https://github.com/eddelbuettel/rcppsimdjson/issues/71) into a design/capability wishlist that facilitates better future-proofing the package A list of all the things that need to be considered will assist the redesign.

These are things I'm tracking now.

  • Sync w/ upstream simdjson
    • On-demand parser
  • NDJSON/JSONL support
  • con()nections
  • drop-in jsonlite::fromJSON() replacement
    • nested data frame columns (https://github.com/eddelbuettel/rcppsimdjson/issues/71)
  • drop-in jsonify::from_json() replacement?

knapply avatar Aug 04 '21 10:08 knapply

@NicolasJiaxin Is working on an On Demand prototype at https://github.com/lemire/rcppsimdjson/pull/1 The purpose is to prove that it can be done.

By the end of the summer, we should have simdjson 1.0 though it would not affect https://github.com/eddelbuettel/rcppsimdjson/pull/70 much since the DOM API did not change between 0.9 and 1.0 (it is quite stable at this point). However, it can make On Demand more appealing.

lemire avatar Aug 04 '21 15:08 lemire

All good, actually. I am not too concerned about the state of things. The combination of two orthogonal sets of wickedness in the simdjson library and the clever (and quickly written) package by @knapply mean that we have something rather useful and performant. There will always be users asking for a shot of cream and two sugars to go along with the strong and freshly brewed coffee we over here but we cannot always be all things to all people all the time -- and for free.

Later redesign update during/after 1.0 release sounds good to me too.

eddelbuettel avatar Aug 04 '21 16:08 eddelbuettel

If memory serves, the obstacle for On Demand was the inability to obtain the size of arrays, but it looks like that was solved by array::count_elements() while I've been distracted elsewhere... https://github.com/simdjson/simdjson/blob/b79261eebcd7b9a784f1e2d17de904841713f80c/include/simdjson/generic/ondemand/array-inl.h#L92-L102

Awesome!

knapply avatar Aug 04 '21 16:08 knapply

@knapply Indeed. There might be other obstacles, but @NicolasJiaxin should stumble on them. If he manages to create the prototype, then we know it is probably all good.

lemire avatar Aug 04 '21 16:08 lemire