jsonschema-rs
jsonschema-rs copied to clipboard
simd support?
Already, jsonschema-rs is quite performant.
However, have you looked into using crates like simd-json, simdutf8 to make it even faster?
Yes,
I am actively looking into these things and wanted to publish a design document to get feedback on implementation. It is also somehow a roadmap to 1.0 and will contain at least the following areas:
- Keywords layout. As described in #212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses
enum_dispatch). Though it is incomplete but unlocks a lot - for example, there will be no need forRwLockin$refas it will be possible to evaluate them at the compilation phase. - Custom input types. It seems like the way to support the crates you mentioned + other external types (like Python ones). Not sure what would be the best way to do so :( my attempts to wrap
serde_json::Valuewithout sacrificing too much were not successful. - Real error iterator. Now there are tons of unnecessary allocations on each
validatecall + all theflat_mapcalls are responsible for long compile times (according tollvm-lines). I'd like to have some tree iterator that doesn't allocate intermediate vectors - not sure about the right way to suspend/resume such a process. Maybe a separate state machine transitions table would work for this. - Avoid extra costs of
SchemaNode- it is not needed foris_validandvalidatecalls, but adds extra overhead.
I expect to have it in a few days and it is roughly my roadmap for this lib :) I'd appreciate if you could share your thoughts on this or share your use case for integrating the crates you mentioned
Sorry I didn't get back earlier, but thanks for your thorough response!
I don't know enough about your implementation to cogently comment on your points, but the details I can tease out indicates that there's a lot of headroom the library can exploit to squeeze more performance.
I'm looking forward to the design document!
What I can contribute are my use-cases.
Currently, I'm using jsonschema-rs to validate CSV files (and that's why I originally asked about #339 ), and after using rayon, the performance is already quite impressive.
https://github.com/jqnatividad/qsv/issues/164
But as the flamegraph shows, any incremental performance from jsonschema will further accelerate qsv's validate cmd.
I plan to leverage the qsv validate command in another project - https://github.com/dathere/datapusher-plus to validate CSV files before they are uploaded to CKAN.
@Stranger6667
https://github.com/Stranger6667/jsonschema-rs/issues/212. I started a complete rewrite in a separate repo and just this change yields ~50% validation time reduction in some benchmarks + simplifies the code dramatically (it also uses enum_dispatch). Though it is incomplete but unlocks a lot - for example, there will be no need for RwLock in $ref as it will be possible to evaluate them at the compilation phase.
That sounds really interesting! Is that repo publically available?
@manuschillerdev I added it as a separate crate here - #373 :) It is a prototype, but ref resolving is more or less ready
Btw, @jqnatividad thanks for sharing your use case! I hope that soon we all can benefit from faster validation! :)
the changes though are quite large and I’ll appreciate any help there :)
@Stranger6667 I'll start testing the jsonschema-csr prototype and will let you know my findings!
I need to update qsv's benchmarks soonish and I'll be sure to include the prototype in it when I do.
And once I grok the internals, you can be sure I'll try to help as best as I can.
@jqnatividad Thank you! The currently submitted version is not working yet, but I am slowly working on it :)