indexmap
indexmap copied to clipboard
Parameterize the index type of maps and sets
The index type is a new generic parameter defaulting to usize
:
pub struct IndexSet<T, S = RandomState, Idx = usize>
pub struct IndexMap<K, V, S = RandomState, Idx = usize>
In no_std
builds, S
has no default, but still Idx = usize
.
This may be used to optimized the storage size in the raw table, for
example using a u32
index if you know that u32::MAX
capacity is
sufficient for you. It may also be useful to define a newtype index
custom to a particular domain, which only needs to implement the new
Indexable
trait to be useful here.
Nice idea and pretty smooth implementation.
With the into/from usize conversion, we insert a lot more panic sites in the code. And I'd like to ensure we can't panic in places that leave the map inconsistent. (Is this a too onerous goal?)
The first usize conversion in .pop()
looks like a problem. But it is actually guarded (global invariant) if we just ensure the map length never grows larger than what the index size can handle. However, it is not apparent if/how the insert method ensures this.
(This assumes a faithful fancy index to usize conversion - an unfaithful one can of course panic anyway, because invariants like that are useless.)
Then we document that insert panics if the index type's limits are exceeded etc. (And .extend() and .collect() et.c. 🙁 I'm just sad that it gets ugly in the details.)
@bluss it's not clear to me that a smaller index type needs special treatment – after all, the current logic also panics if an allocation can't be met (for whatever reason, possibly capacity > usize).
@malthe That's too hand-wavy, the exact line and place where we can panic matters for which inconsistencies can show up in the data structure.
@bluss but if we panic, what's left of the program? Is it not just going to halt?
Another issue I found was that it feels inconsistent that len()
should return a usize when the index parameter is something different.
Panic is recoverable, so that's why it matters.
@cuviper I have begun some rebasing against the current master on this one, but let me know if you already have something cooking.
Len should probably continue to return usize, just to note - without looking deeper.
@malthe Sorry, I don't know how far you got, but I just pushed a rebase myself. I was responsible for a lot of the conflicting changes anyway, so I had a good idea how to solve things.
I have mixed feelings whether the length and capacity should be usize
or Idx
. As usize
, there's a small bonus that the maximum like 255u8
is a usable index, since we can still report length 256. As Idx
, we would at least need Indexable
to tell us its maximum value, so we can saturate when RawTable
over-allocates. (e.g. RawTable::with_capacity(224)
will allocate 256 and report capacity 224 exactly, but with_capacity(225)
allocates 512 and reports capacity 448 -- too high for Idx = u8
.) It might make some of those Idx
<-> usize
panics easier to deal with if the maximum is a sentinel though.
@cuviper yeah I think it has to be usize
– but perhaps there could be a function that returns a range, i.e. 0..length - 1
which then would be type-safe on the index type. Thanks for the rebasing, there were so many conflicts to resolve.
To get some real data, I tried to use this branch in rust-lang/rust#96751 and its perf infrastructure. For instructions, there were a few small wins, but nothing better than 1% improvement, and one case regressed 1.68%. Maybe that's a good thing that it was mostly neutral, since this has added usize
conversions in a lot of places. But max-rss didn't improve either, just looks like noise.
It's possible that I just didn't make good use of it, but those were all the places I could find that clearly dealt with smaller indexes. Maybe there are more significant maps that would be happy limited to u32::MAX
items, just as the compiler often does with its IndexVec
.
Overall, I'm not inclined to merge this if there's no clear winner, as the new parameter is pretty invasive.