[DRAFT] indexmap support
Summary
When the preserve_order feature is enabled, JSON objects now use IndexMap instead of HashMap to maintain insertion order. This is visible for objects with more than 32 keys, where halfbrown switches from Vec to HashMap.
Key changes
- Conditionally define Object type alias based on preserve_order feature
- Update ObjectHasher to use RandomState for IndexMap compatibility
- Add ObjectMap type alias to abstract over HashMap/IndexMap
- Fix FromIterator impls to explicitly use hasher for IndexMap
- Update all deserializer structs to support both map types
- Add conditional imports for indexmap throughout codebase
Fixes ordering issue for objects with 33+ keys when preserve_order feature is enabled.
- As discussed in #378
- Tracked in #270
- Updated to use value-trait v0.12.1 (released: https://github.com/simd-lite/value-trait/pull/60#issuecomment-3395216624)
Success :tada:
Tests pass both with and without the new feature flag
cargo nextest run -F preserve_order -E 'test(preserve_order)'
Starting 8 tests across 7 binaries (448 tests skipped)
PASS [ 0.003s] simd-json value::borrowed::test::preserve_order_roundtrip_33_keys
PASS [ 0.003s] simd-json value::borrowed::test::preserve_order_32_keys_baseline
PASS [ 0.003s] simd-json value::borrowed::test::preserve_order_33_keys
PASS [ 0.003s] simd-json value::borrowed::test::preserve_order_50_keys
PASS [ 0.002s] simd-json value::owned::test::preserve_order_33_keys
PASS [ 0.003s] simd-json value::owned::test::preserve_order_roundtrip_33_keys
PASS [ 0.003s] simd-json value::owned::test::preserve_order_32_keys_baseline
PASS [ 0.004s] simd-json value::owned::test::preserve_order_50_keys
────────────
Summary [ 0.037s] 8 tests run: 8 passed, 448 skipped
Full suite:
- No features
Summary [ 0.926s] 448 tests run: 448 passed, 2 skipped
-
-F preserve_order
Summary [ 0.960s] 454 tests run: 454 passed, 2 skipped
Rewrite
Components implemented
- [x] cmp
- [x] from
- [x] de
- [x] se
- [ ] lazy
- [ ] tape
:heavy_check_mark: Compatible with all features
Status
- A lot of heavy lifting has been done, some of the initial "feature switching" approach is still in there, most notably in the ObjectMap which is used in the generic
deserializefunctions. These functions need to work with both ordered and non-ordered types, depending on what the caller wants. So de/serialisation will either need to get additional separate deserialise functions for ordered types, or the existing ones should work generically with both.
Sorry Heinz, what do you mean? I outlined my approach here as 2 steps, and you accepted the PR in value-trait with the first one, which this uses as its follow-on 2nd step. What part did I miss?
Do you mean this comment?
Ja, I prefer a new type for the index map backed value over it switching to allow the use of multiple map backends in the same project without feature flag conflicts or surprising changes in behavior. An alternative to
OrderedBorrowedValuewould be parameterizingBorrowedValuewith a map backend not sure if that would improve things or make then tougher 😅 ultimately I think both are okay.
I took that last sentence to mean that the way I had outlined was okay, is this not what you had in mind?
edit oh, sorry I misread that as saying the approach I had or an alternative parameterising the type would both be okay, I see now you were actually suggesting 2 alternatives. Gotcha
I submitted this PR 6 hours ago and you replied 4 hours ago so I wasn't ignoring your comments, just looks like I implemented the wrong way :disappointed:
New type is now feature gated in but does not alter the existing one (the feature gate switch was an easier middle ground towards it though!)
Testing feature compatibility
Click to show script to run cargo tests with all feature combos
#!/usr/bin/env bash
set -e
FEATURES=(
arraybackend
value-no-dup-keys
128bit
known-key
swar-number-parsing
approx-number-parsing
serde_impl
alloc
no-inline
bench-serde
ordered-float
hints
perf
docsrs
runtime-detection
bench-all
bench-apache_builds
bench-event_stacktrace_10kb
bench-github_events
bench-canada
bench-citm_catalog
bench-log
bench-twitter
big-int-as-float
)
echo "==> Running feature tests for simd-json"
echo " (only failures will be shown)"
for feat in "${FEATURES[@]}"; do
echo -n "Testing feature: $feat ... "
if cargo nextest run -F "$feat" > /dev/null 2>&1; then
echo "ok"
else
echo "❌ failed (feature: $feat)"
fi
echo -n "Testing feature: preserve_order + $feat ... "
if cargo nextest run -F preserve_order -F "$feat" > /dev/null 2>&1; then
echo "ok"
else
echo "❌ failed (features: preserve_order + $feat)"
fi
done
echo "==> All tests completed."
Results:
==> Running feature tests for simd-json
(only failures will be shown)
Testing feature: arraybackend ... ok
Testing feature: preserve_order + arraybackend ... ok
Testing feature: value-no-dup-keys ... ok
Testing feature: preserve_order + value-no-dup-keys ... ok
Testing feature: 128bit ... ok
Testing feature: preserve_order + 128bit ... ❌ failed (features: preserve_order + 128bit)
Testing feature: known-key ... ok
Testing feature: preserve_order + known-key ... ❌ failed (features: preserve_order + known-key)
Testing feature: swar-number-parsing ... ok
Testing feature: preserve_order + swar-number-parsing ... ok
Testing feature: approx-number-parsing ... ok
Testing feature: preserve_order + approx-number-parsing ... ok
Testing feature: serde_impl ... ok
Testing feature: preserve_order + serde_impl ... ok
Testing feature: alloc ... ok
Testing feature: preserve_order + alloc ... ok
Testing feature: no-inline ... ok
Testing feature: preserve_order + no-inline ... ok
Testing feature: bench-serde ... ok
Testing feature: preserve_order + bench-serde ... ok
Testing feature: ordered-float ... ok
Testing feature: preserve_order + ordered-float ... ok
Testing feature: hints ... ❌ failed (feature: hints)
Testing feature: preserve_order + hints ... ❌ failed (features: preserve_order + hints)
Testing feature: perf ... ok
Testing feature: preserve_order + perf ... ok
Testing feature: docsrs ... ok
Testing feature: preserve_order + docsrs ... ok
Testing feature: runtime-detection ... ok
Testing feature: preserve_order + runtime-detection ... ok
Testing feature: bench-all ... ok
Testing feature: preserve_order + bench-all ... ok
Testing feature: bench-apache_builds ... ok
Testing feature: preserve_order + bench-apache_builds ... ok
Testing feature: bench-event_stacktrace_10kb ... ok
Testing feature: preserve_order + bench-event_stacktrace_10kb ... ok
Testing feature: bench-github_events ... ok
Testing feature: preserve_order + bench-github_events ... ok
Testing feature: bench-canada ... ok
Testing feature: preserve_order + bench-canada ... ok
Testing feature: bench-citm_catalog ... ok
Testing feature: preserve_order + bench-citm_catalog ... ok
Testing feature: bench-log ... ok
Testing feature: preserve_order + bench-log ... ok
Testing feature: bench-twitter ... ok
Testing feature: preserve_order + bench-twitter ... ok
Testing feature: big-int-as-float ... ❌ failed (feature: big-int-as-float)
Testing feature: preserve_order + big-int-as-float ... ❌ failed (features: preserve_order + big-int-as-float)
==> All tests completed.
Feature combinations that failed:
- ❌
fixedpreserve_order + 128bit - ❌
fixedpreserve_order + known-key
The hints and big-int-as-float features did not have passing tests after I checked out the main branch either, so false negative
✅ All other combinations passed.
128bit
The build broke due to missing from.rs feature-gated entries for the 128 bit types like the unordered value from modules had
known-key
The build breaks because preserve_order hardcodes IndexMap to use RandomState, while known-key switches the expected hasher to NotSoRandomState, causing a type mismatch.
Sadly we're living in a time where I've to write reviews like this but I'm getting the strong feeling you're throwing an AI at this problem without thinking or reviewing your own PR.
If that's the case please just close the PR, I'm neither interested nor willing in reviewing 7000 LOC AI slop nor taking on the maintenance burden of it
I am not "just throwing an AI at this problem without thinking or reviewing your own PR", no. Ouch :smiling_face_with_tear:
I understand what I'm doing, and the draft status of this PR I put back on it after your initial review was intended to communicate 'Work In Progress', sorry I could've made that more clear.
some of the initial "feature switching" approach is still in there
Specifically this is where the PR description noted that this is WIP
I spent yesterday rewriting from the easy initial proof of concept (simple feature gated approach which swapped the hashmap, that was simple to implement and worked but the wrong design) to the new one (adding the indexmap so it won't interfere with the existing one, that requires more extensive changes). The main difference being it needs all new impls (for the value: cmp, from; for the serde value: de, se).
Code production and editing has been done by me, incrementally moving from the feature gated swapping to the feature gated addition style not automated with an AI agent.
I rolled back the original commits and re-pushed because at midnight I was done for the night having made good progress, not because I was finished/ready to review.
The size of the code additions has mainly come from plain old cp tool copying files (+ rg, sed, vim etc.), as introducing new types requires duplicating impls etc. That makes sense in this context to do, because the super::Value when applied to a submodule under ordered:: can simply refer to the ordered Value type and leave it all the same (e.g. this goes for the cmp and from modules). We already discussed doing this, please let me know if you think the code growth here is an undesirable maintenance burden but simply diffing them against their existing corresponding files should show they are nothing genuinely new. Refactoring said duplicated modules is of course possible to DRY this out, but I felt the first port of call should just be duplicating in standalone modules, and figuring out better ways on further iterations.
I have written a checklist of the components that were successfully ported (all except lazy/tape) because this is a big undertaking. The last thing I did here was notice that I had indeed left the ObjectHasher (as you also noted in review).
- I was also meaning to ask whether or not making lazy/tape support indexmap should be considered necessary for this PR to land
The last thing I did last night was leave a "note to self" to address that very thing:
In short I put this PR back to a draft PR because it is not ready after I initially took it in the wrong direction (as already discussed), and was not expecting you to review. I woke up this morning and the first thought I had upon waking was about the next step to implement, I am absolutely not "not thinking" or not invested in doing this right.
Sorry for the rudeness @lmmx
Aha it's all good :sweat_smile: :pray: