jiter
jiter copied to clipboard
Splitting value/python away from jiter's parser/decoders/iterators?
Would you consider splitting those into separate crates?
I'm asking because my crate now has its own Value type (it's not specifically tied to Python, it doesn't Arc all the things for example, it doesn't depend on ahash directly, etc.) and so it only uses the iterator API.
It would be neat to be able to share the parser/iterator between jiter & merde, so that if one adds x86_64 SIMD for example, the other would get it too, etc.
The python stuff is already behind a feature flag, so that's fine.
Out of interest, what do you use instead of Arc?
I'd be happy to accept a PR to make Value a feature. Related, @davidhewitt and I were discussing simplifying LazyIndexMap to instead just use a Vec or SmallVec anyway.
You can see my take on Value here: https://docs.rs/merde/latest/merde/enum.Value.html
(with CowStr’s owned type being compact_str if the feature is enabled)
I think for python it probably makes more sense to have everything reference counted, since those are more or less the semantics of any garbage collected language. i’m only thinking about rust API consumers in my case.
I did end up getting rid of the lazy index map because it’s a clever trick if you unconditionally deserialize to that, but in the case of merde the common case is to target a rust struct so there’s no hashing done at all
I'd be happy to accept a PR to make
Valuea feature. Related, @davidhewitt and I were discussing simplifyingLazyIndexMapto instead just use aVecorSmallVecanyway.
Are you dead-set on keeping it all one single crate? I know inter-crate optimization can be touchy (nothing thin-LTO + more aggressive #[inline] annotations cannot fix afaik) but I do believe there's a clean separation between the iterator interface and the value interface there that would be easier to maintain as two separate crates (for both of us).
Opposite question, What's the advantage of splitting?
Opposite question, What's the advantage of splitting?
Testing for combination of feature flags is famously hard to get right, the number of combinations explode, it's so easy to forget about a combination that doesn't work, and the whole editing experience is just awkward (e.g. code editor settings to have cargo check/clippy run with some feature flag combinations).
Dealing with optional dependencies is aggravating too, a lot of things are possible but eventually you're doing a lot of work just to avoid doing cargo publish twice — and that's something that's easily automated with something like https://release-plz.ieni.dev
Working on different component in isolation is also much nicer when they're split: say you're running cargo test on save or something, instead of the compiler having to consider the whole dependency tree, make sure all deps are downloaded, then parse everything and throw away anything that's gated by a cfg, it's only doing exactly the work it needs to do.
The "global" dependency tree will also be clearer: on crates.io / lib.rs, if you go on jiter-core or whatever, you'll see who depends on the core parser/iterator, rather than those who depends on the value/python side of things — feature flags are not tracked at all.
CVEs are reported against entire packages — imagine there's a vuln in jiter (the fat one, with python, value etc.) but not in jiter-core — a of packages would avoid the unnecessary noise.
Sometimes splitting up packages is touchy due to the lack of a clear interface between them. Here it's so clear I was honestly kinda surprised you asked for additional reasoning x) But I'm fairly happy with the reasons I gave, I can rack my brain for more if you'd like.
As of #184 I've removed the LazyIndexMap type. I think moving out the value into a jiter-value crate seems reasonable, eventually I may get to this as we refactor how pydantic-core consumes jiter 👍
For now merde uses a sort of internal fork of jiter — I would love to have it depend on the core instead. I'll keep tabs with this issue to see how things evolve! No pressure :)
As of #184 I've removed the
LazyIndexMaptype. I think moving out the value into ajiter-valuecrate seems reasonable, eventually I may get to this as we refactor howpydantic-coreconsumesjiter👍
Copied from another issue I opened because I couldn't find that one:
I would love to once again use jiter in another one of my packages, facet-json (see https://github.com/facet-rs/facet) — But we want it to be no_std compatible and minimal versions and everything and some of the dependencies pulled in just to support JSON value are actually failing to build properly. And looking at it closer, we don't actually need bitvec if we're not using JSON value. What's the status on separating those?