indexmap icon indicating copy to clipboard operation
indexmap copied to clipboard

Parameterize the index type of maps and sets

Open cuviper opened this issue 4 years ago • 10 comments

The index type is a new generic parameter defaulting to usize:

pub struct IndexSet<T, S = RandomState, Idx = usize>
pub struct IndexMap<K, V, S = RandomState, Idx = usize>

In no_std builds, S has no default, but still Idx = usize.

This may be used to optimized the storage size in the raw table, for example using a u32 index if you know that u32::MAX capacity is sufficient for you. It may also be useful to define a newtype index custom to a particular domain, which only needs to implement the new Indexable trait to be useful here.

cuviper avatar Aug 08 '20 04:08 cuviper

Nice idea and pretty smooth implementation.

With the into/from usize conversion, we insert a lot more panic sites in the code. And I'd like to ensure we can't panic in places that leave the map inconsistent. (Is this a too onerous goal?)

The first usize conversion in .pop() looks like a problem. But it is actually guarded (global invariant) if we just ensure the map length never grows larger than what the index size can handle. However, it is not apparent if/how the insert method ensures this. (This assumes a faithful fancy index to usize conversion - an unfaithful one can of course panic anyway, because invariants like that are useless.)

Then we document that insert panics if the index type's limits are exceeded etc. (And .extend() and .collect() et.c. 🙁 I'm just sad that it gets ugly in the details.)

bluss avatar Jan 01 '21 13:01 bluss

@bluss it's not clear to me that a smaller index type needs special treatment – after all, the current logic also panics if an allocation can't be met (for whatever reason, possibly capacity > usize).

malthe avatar Jan 23 '21 08:01 malthe

@malthe That's too hand-wavy, the exact line and place where we can panic matters for which inconsistencies can show up in the data structure.

bluss avatar Jan 23 '21 09:01 bluss

@bluss but if we panic, what's left of the program? Is it not just going to halt?

Another issue I found was that it feels inconsistent that len() should return a usize when the index parameter is something different.

malthe avatar Jan 23 '21 20:01 malthe

Panic is recoverable, so that's why it matters.

bluss avatar Jan 23 '21 23:01 bluss

@cuviper I have begun some rebasing against the current master on this one, but let me know if you already have something cooking.

malthe avatar Feb 17 '21 07:02 malthe

Len should probably continue to return usize, just to note - without looking deeper.

bluss avatar Feb 17 '21 08:02 bluss

@malthe Sorry, I don't know how far you got, but I just pushed a rebase myself. I was responsible for a lot of the conflicting changes anyway, so I had a good idea how to solve things.

I have mixed feelings whether the length and capacity should be usize or Idx. As usize, there's a small bonus that the maximum like 255u8 is a usable index, since we can still report length 256. As Idx, we would at least need Indexable to tell us its maximum value, so we can saturate when RawTable over-allocates. (e.g. RawTable::with_capacity(224) will allocate 256 and report capacity 224 exactly, but with_capacity(225) allocates 512 and reports capacity 448 -- too high for Idx = u8.) It might make some of those Idx <-> usize panics easier to deal with if the maximum is a sentinel though.

cuviper avatar Mar 03 '21 20:03 cuviper

@cuviper yeah I think it has to be usize – but perhaps there could be a function that returns a range, i.e. 0..length - 1 which then would be type-safe on the index type. Thanks for the rebasing, there were so many conflicts to resolve.

malthe avatar Mar 03 '21 20:03 malthe

To get some real data, I tried to use this branch in rust-lang/rust#96751 and its perf infrastructure. For instructions, there were a few small wins, but nothing better than 1% improvement, and one case regressed 1.68%. Maybe that's a good thing that it was mostly neutral, since this has added usize conversions in a lot of places. But max-rss didn't improve either, just looks like noise.

It's possible that I just didn't make good use of it, but those were all the places I could find that clearly dealt with smaller indexes. Maybe there are more significant maps that would be happy limited to u32::MAX items, just as the compiler often does with its IndexVec.

Overall, I'm not inclined to merge this if there's no clear winner, as the new parameter is pretty invasive.

cuviper avatar May 07 '22 00:05 cuviper