automap icon indicating copy to clipboard operation
automap copied to clipboard

Store keys directly in the table, instead of in a separate list

Open brandtbucher opened this issue 2 years ago • 2 comments

It occurred to me recently that the internal representation of our hash table does not need to concern itself with ordering. In fact, because of the nature of our mappings, order can be determined at any point by just using the values! Getting rid of our lists of keys would save a lot of allocations during construction, and some indirection during lookups.

If we did this, unordered iteration would still be linear, but ordered iteration would be quadratic with a naive implementation and quasi-linear with a more clever, complicated one.

Also, we would want to protect against adding keys during unordered iteration (which is super easy to check using the map's length). There's precedent for this with dict:

>>> d = dict.fromkeys(["S", "p", "a", "m"])
>>> for k in d:
...     d["X"] = None
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

As a side note, this also closes the door on one clever way that users can currently put our tables in a bad state (possibly causing hard crashes): by getting our hidden lists of keys/values using the gc module, and mutating them in-place.

@flexatone, some questions for you:

  • Does StaticFrame care about automap's iteration order? If so, does it care about the performance of ordered iteration?
  • If not, does need to be able to add keys during unordered iteration?

My guess is that the answer to both of these is "no", but it would be nice to know for sure.

brandtbucher avatar Oct 01 '22 20:10 brandtbucher

Thanks, @brandtbucher , for these thoughts.

First, reviewing StaticFrame's code, there is at least one case where we rely on the ordered iteration of automap; I am not certain, however, how often that case happens, and at first look, it does not seem necessary or unrefactorable. As StaticFrame Index objects retain a "labels" array, it is this that is used for implementing __iter__ and other order-dependent label evaluations.

Second, I am certain we never need to add keys during iteration.

If this change offers significant performance gains for initialization, it likely would be worth any changes StaticFrame would need to make.

Returning to the first question, if lookup is till quasi constant time, why would ordered iteration not be linear if, for example, we were just driving lookups with a range, i.e., (map[i] for i in range(len(map)))?

flexatone avatar Oct 04 '22 16:10 flexatone

Cool!

Returning to the first question, if lookup is till quasi constant time, why would ordered iteration not be linear if, for example, we were just driving lookups with a range, i.e., (map[i] for i in range(len(map)))?

That would only work if the mapping was an identity mapping of autoincremented integers i -> i. range(len(map)) equals map.values(), not necessarily map.keys().

We basically need to do the equivalent of sorted(map, key=map.__getitem__).

brandtbucher avatar Oct 04 '22 17:10 brandtbucher