mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[stdlib] Add `Dict._resize_down` and `_under_load_factor()`

Open rd4com opened this issue 7 months ago • 1 comments

Hello,

this is a PR to solve a problem mentioned by @bethebunny (in #3128)

Good little feature to have,

it probably should not be a default and be parametrized,

because of the cost in performance of halving memory.

💡 for the Dictionary that not down-resize

Did an implementation of it, maybe you'll like it !

It really grows and ungrows dynamically,

at each pop, if self.size < 1/3 of self._reserved(),

the Dict halves it's memory with a smaller list capacity.

(so the RAM usage is dynamic and change on the size)

When pop is used on all elements of a dict of 256 elements,

the transition goes from 512, 256, 128, 64, 32, 16, 8, 4

That way, it doubles on 2/3 and halves on 1/3

 

The ratio is quite simple and it seem to do the job: (same as for resizing up (below 1/3 of self._reserved()))

0.33 * 256 == 84.48 <- we are below (we can resize down) 0.66 * 128 == 84.48 <- we are below (no need to resize up again)

0.66 * 256 == 168.96 <- we are above (we can resize up) 0.33 * 512 == 168.96 < we are above (no need to resize down again)

Example of a Dict cycling trough grow and ungrow:

from time import now
def main():
    var result: Int=0
    var dict_size = 256
    var start = now()
    var stop = now()
    
    result = 0

    small2 = Dict[Int,Int]()
    start = now()
    for y in range(2):
        for i in range(dict_size):
            # grow the dictionary up to its full size
            print("before",small2._reserved(), len(small2))
            small2[i]=i
            print("after",small2._reserved(), len(small2))

        for i in range(dict_size):
            # ungrow the dictionary
            print("before",small2._reserved(), len(small2))
            result += small2.pop(i)
            print("after",small2._reserved(), len(small2))
    stop = now()
    print(stop-start, result)

The result is correct (65280):

x = 0
for y in range(2):
    for i in range(256):
        x+=i
print(x) #65280

 

Maybe that kind feature need to be parametrized, so users can choose? (resizing down reduce the performance, so may not be a good default)

  • DictConfig struct, as default Dict parameter is a possibility
    • we might end up in a parametrization challenge later
  • another Dict type is also a possibility (BaloonDictionary like) ?

I suggest the second one would be better, but struggle for naming here.

 

If reviewers likes it, maybe we can bring it to life with more reviews ?

(There are more methods where the logic need to be done, and tests)

 

Note that the hard work have been done by people working on Dict,

this pr just add a halving part, which use the same ratio as the doubling.

rd4com avatar Jun 28 '24 10:06 rd4com