[Stdlib] Speedup `Dict` (reuse removed _DictIndex slots (not entries))

Open rd4com opened this issue 8 months ago • 3 comments

Hello, 🐍 python went trough it's adventure of improving its dictionary,

there are many Python conference ▶️ videos that talks about that,

the speakers explains so well what is a dictionary and what is important,

that i decided to learn and try to work a little bit on that.

♻️ Problem solved:

if a key is popped, the slot hash%self.reserved() is marked as 🗑️ removed in self._index, if it is reinserted, its hash will need to be 🔂 perturb until we find a slot marked with empty.

it means that inserting and popping the same key X times in a row makes its probe time x times longer.

current PR let us re-use ♾️ removed slots to create the shortest probe sequence possible. (this does not affect the order of the dict, just its probe sequence)

It will still look until it find an empty slot, to make sure the key does not exist, but if it encounted a removed slot on the way, it will reuse it.

That way, the slot will always be the nearest amount of perturb away.

❄️ Another change is in `_maybe_resize()`,

We just append self._reserved() times None to the List, which makes it double.

Then we call self._compact which recalculate the indexes and compact it !

It accomplish the same as before but faster because:

_compact uses _find_empty_index instead of _find_index (which do an equality on the keys)

We can save a few performance on that check for equality,

because the dict already said that every entry it contains deserves a unique existence;

💡 Doing the compaction is a very costly thing

maybe we should just check if it is needed, once we doubled the size of the dict ?

Remember, the first compaction check happens only if we don't need to resize (not self._over_load_factor()).

Once we are in the resize logic, we still don't know if we need to compact,

this could save us from moving many items in the list;

If no compaction is needed, we could just re-index each entries.

( If the user only popped one key in the middle of many entries, should we move everything just for that ? )

I think that the resize*2 logic need one new step:

double the size by appending None
check if we need to compact (calculated using the new self._reserved()) 2.a yes, then call self._compact which do all the work for us 2.b no, just assign empty slots to each live entries

(The current PR include that change ✅)

What do you think ?

Benchmark 1, which is focus on the feature

94x speedup improvement on this particular benchmark (This benchmark is design for theses new features)

A:22730192154 B:239400760

from time import now

def main():
    var result: Int=0
    var dict_size = 1<<12
    var start = now()
    var stop = now()
    
    result = 0

    small2 = Dict[Int,Int]()
    start = now()
    for x in range(200):
        for i in range(dict_size):
            small2[i]=i
            result+= small2[i]
            result+=small2.pop(0)
            small2[0] = 0
    stop = now()
    print(stop-start, result)

Benchmark 2, which is focus on just resizing by 2

+= 1.6x (difficult to say without feedbacks)

A:1163417881 B:698271306

from time import now

def main():
    var result: Int=0
    var dict_size = 1<<16
    var start = now()
    var stop = now()
    
    result = 0

    start = now()
    for x in range(50):
        small2 = Dict[Int,Int]()
        for i in range(dict_size):
            small2[i+x]=i+x
            result+= small2[i+x]
    stop = now()
    print(stop-start, result)

Benchmark 3

from 1.3 to 2X

from time import now

def main():
    var result: Int=0
    var dict_size = 1<<8
    var start = now()
    var stop = now()
    
    result = 0

    small2 = Dict[Int,Int]()
    start = now()
    for x in range(8):
        for i in range(dict_size):
            # grow the dictionary up to its new size
            small2[i]=i
        for y in range(100):
            for i in range(dict_size>>2):
                # pop one fourth of it 
                _=small2.pop(i)
            for i in range(dict_size>>2):
                # assign it back
                small2[i]=i
            for i in range(dict_size>>2):
                result+=small2[i]
        # double the size of the dictionary
        dict_size *= 2
    stop = now()
    print(stop-start, result)

Benchmark 4, resizing for small dictionaries

1.39x speedup

A:745216 B:533981

from time import now

def main():
    var result: Int=0
    var dict_size = 1<<8
    var start = now()
    var stop = now()
    
    result = 0

    start = now()
    for x in range(16):
        small2 = Dict[Int,Int]()
        for i in range(dict_size):
            small2[i]=i
            result += small2[i]
    stop = now()
    print(stop-start, result)

The dictionary start with a capacity of 8 entries, it and doubles after some insertions until 512.

Each time it reaches 2/3 of its current capacity, it doubles. (this check is done after each insertions)

(This is what _maybe_resize does)

It is difficult to say how better or not it is,

hopefully, somebody will be able to ⏱️ benchmark this well and decide ?

I'd like to finish this long PR description with a good job for our Dict,

it is probably the most favourite type of users,

let's continue !

Jun 26 '24 22:06 rd4com

mojo mojo copied to clipboard

[Stdlib] Speedup `Dict` (reuse removed _DictIndex slots (not entries))

♻️ Problem solved:

❄️ Another change is in _maybe_resize(),

💡 Doing the compaction is a very costly thing

Benchmark 1, which is focus on the feature

Benchmark 2, which is focus on just resizing by 2

Benchmark 3

Benchmark 4, resizing for small dictionaries

mojo
mojo copied to clipboard

❄️ Another change is in `_maybe_resize()`,