lpython icon indicating copy to clipboard operation
lpython copied to clipboard

Dict implementation design

Open certik opened this issue 1 year ago • 13 comments

Here are the freedoms that we have in implementation dictionaries:

  • Hash function
  • load factor / initial size of an array
  • array resizing strategy (rehashing or not)
  • key collision strategy
  • memory allocation
  • ordered vs unordered
  • hash table vs red-black tree

Here is how we can do this:

  • [ ] We start with some defaults (just one choice for all of the above)
  • [ ] We implement more (eventually all) choices, and allow the user to select using a type annotation, such as x: Dict[hash3, loadfactor(5), resizing("x"), collision_strategy("list")].
  • [ ] We can then benchmark various combinations and choose some good solid defaults
  • [ ] We can later add a user defined hash function that gets compiled and inlined in LLVM, so optimal performance.
  • [ ] Ideally the hash function should be defined in the frontend language too

Note: #975 implements unordered dict. The ordered dict implementation can be implemented on top by adding some data structure to maintain the order of key insertion nodes (such as linked list), and another hash table of key -> node, so that you can do insertion, lookup and deletion all in O(1); it seems lookup is as fast as for unordered dict, and so is unordered iteration; but insertion and deletion is slower due to updating of the two extra datastructures; and there is a new feature "ordered iteration" (that did not exist in unordered dict, but it is slower than "unordered iteration"); more details at https://github.com/Aqcurate/OrderedDict#ordered-dictionary-implementation.

certik avatar Aug 17 '22 13:08 certik