automap
automap copied to clipboard
Store keys directly in the table, instead of in a separate list
It occurred to me recently that the internal representation of our hash table does not need to concern itself with ordering. In fact, because of the nature of our mappings, order can be determined at any point by just using the values! Getting rid of our lists of keys would save a lot of allocations during construction, and some indirection during lookups.
If we did this, unordered iteration would still be linear, but ordered iteration would be quadratic with a naive implementation and quasi-linear with a more clever, complicated one.
Also, we would want to protect against adding keys during unordered iteration (which is super easy to check using the map's length). There's precedent for this with dict
:
>>> d = dict.fromkeys(["S", "p", "a", "m"])
>>> for k in d:
... d["X"] = None
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
As a side note, this also closes the door on one clever way that users can currently put our tables in a bad state (possibly causing hard crashes): by getting our hidden lists of keys/values using the gc
module, and mutating them in-place.
@flexatone, some questions for you:
- Does StaticFrame care about
automap
's iteration order? If so, does it care about the performance of ordered iteration? - If not, does need to be able to add keys during unordered iteration?
My guess is that the answer to both of these is "no", but it would be nice to know for sure.
Thanks, @brandtbucher , for these thoughts.
First, reviewing StaticFrame's code, there is at least one case where we rely on the ordered iteration of automap
; I am not certain, however, how often that case happens, and at first look, it does not seem necessary or unrefactorable. As StaticFrame Index objects retain a "labels" array, it is this that is used for implementing __iter__
and other order-dependent label evaluations.
Second, I am certain we never need to add keys during iteration.
If this change offers significant performance gains for initialization, it likely would be worth any changes StaticFrame would need to make.
Returning to the first question, if lookup is till quasi constant time, why would ordered iteration not be linear if, for example, we were just driving lookups with a range
, i.e., (map[i] for i in range(len(map)))
?
Cool!
Returning to the first question, if lookup is till quasi constant time, why would ordered iteration not be linear if, for example, we were just driving lookups with a
range
, i.e.,(map[i] for i in range(len(map)))
?
That would only work if the mapping was an identity mapping of autoincremented integers i -> i
. range(len(map))
equals map.values()
, not necessarily map.keys()
.
We basically need to do the equivalent of sorted(map, key=map.__getitem__)
.