jsonschema-rs icon indicating copy to clipboard operation
jsonschema-rs copied to clipboard

simd support?

Open jqnatividad opened this issue 3 years ago • 6 comments

Already, jsonschema-rs is quite performant.

However, have you looked into using crates like simd-json, simdutf8 to make it even faster?

jqnatividad avatar May 03 '22 16:05 jqnatividad

Yes,

I am actively looking into these things and wanted to publish a design document to get feedback on implementation. It is also somehow a roadmap to 1.0 and will contain at least the following areas:

  • Keywords layout. As described in #212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses enum_dispatch). Though it is incomplete but unlocks a lot - for example, there will be no need for RwLock in $ref as it will be possible to evaluate them at the compilation phase.
  • Custom input types. It seems like the way to support the crates you mentioned + other external types (like Python ones). Not sure what would be the best way to do so :( my attempts to wrap serde_json::Value without sacrificing too much were not successful.
  • Real error iterator. Now there are tons of unnecessary allocations on each validate call + all the flat_map calls are responsible for long compile times (according to llvm-lines). I'd like to have some tree iterator that doesn't allocate intermediate vectors - not sure about the right way to suspend/resume such a process. Maybe a separate state machine transitions table would work for this.
  • Avoid extra costs of SchemaNode - it is not needed for is_valid and validate calls, but adds extra overhead.

I expect to have it in a few days and it is roughly my roadmap for this lib :) I'd appreciate if you could share your thoughts on this or share your use case for integrating the crates you mentioned

Stranger6667 avatar May 03 '22 17:05 Stranger6667

Sorry I didn't get back earlier, but thanks for your thorough response!

I don't know enough about your implementation to cogently comment on your points, but the details I can tease out indicates that there's a lot of headroom the library can exploit to squeeze more performance.

I'm looking forward to the design document!

What I can contribute are my use-cases.

Currently, I'm using jsonschema-rs to validate CSV files (and that's why I originally asked about #339 ), and after using rayon, the performance is already quite impressive.

https://github.com/jqnatividad/qsv/issues/164

But as the flamegraph shows, any incremental performance from jsonschema will further accelerate qsv's validate cmd.

I plan to leverage the qsv validate command in another project - https://github.com/dathere/datapusher-plus to validate CSV files before they are uploaded to CKAN.

jqnatividad avatar May 07 '22 12:05 jqnatividad

@Stranger6667

https://github.com/Stranger6667/jsonschema-rs/issues/212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses enum_dispatch). Though it is incomplete but unlocks a lot - for example, there will be no need for RwLock in $ref as it will be possible to evaluate them at the compilation phase.

That sounds really interesting! Is that repo publically available?

manuschillerdev avatar May 25 '22 15:05 manuschillerdev

@manuschillerdev I added it as a separate crate here - #373 :) It is a prototype, but ref resolving is more or less ready

Btw, @jqnatividad thanks for sharing your use case! I hope that soon we all can benefit from faster validation! :)

the changes though are quite large and I’ll appreciate any help there :)

Stranger6667 avatar May 25 '22 15:05 Stranger6667

@Stranger6667 I'll start testing the jsonschema-csr prototype and will let you know my findings!

I need to update qsv's benchmarks soonish and I'll be sure to include the prototype in it when I do.

And once I grok the internals, you can be sure I'll try to help as best as I can.

jqnatividad avatar May 27 '22 10:05 jqnatividad

@jqnatividad Thank you! The currently submitted version is not working yet, but I am slowly working on it :)

Stranger6667 avatar May 27 '22 10:05 Stranger6667