python-sortedcontainers icon indicating copy to clipboard operation
python-sortedcontainers copied to clipboard

Slow SortedDict.copy() and SortedDict.values()

Open atellier opened this issue 4 years ago • 1 comments
trafficstars

Hi,

I am looking at using a sorted dict implementation and I have noticed that the copy() function of SortedDict was somewhat "slow", especially compared with other implementations.

Here's the code to test (in Jupyter because of the use of %timeit):

from sortedcontainers import SortedDict
from sortedmap import sortedmap
import string
import random
import numpy as np

letters = string.ascii_uppercase + string.digits
positions = SortedDict({key: 0.0 for key in  [''.join(random.choice(letters) for i in range(12)) for _ in range(5000)]})
positions_dict = dict(positions)

print("SortedDict copy:")
%timeit positions.copy()
print("init with SortedDict:")
%timeit SortedDict(positions)
print("init with dict:")
%timeit SortedDict(positions_dict)
print("dict copy:")
%timeit positions_dict.copy()

print("sortedmap copy:")
positions_map = sortedmap(positions)
%timeit positions_map.copy()

print("SortedDict fromiter:")
%timeit np.fromiter(positions.keys(), dtype="U12", count=len(positions_map))
%timeit np.fromiter(positions.values(), dtype="float", count=len(positions_map))

print("sortedmap fromiter:")
%timeit np.fromiter(positions_map.keys(), dtype="U12", count=len(positions))
%timeit np.fromiter(positions_map.values(), dtype="float", count=len(positions))

and here are the results:

SortedDict copy:
1.09 ms ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
init with SortedDict:
762 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
init with dict:
258 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
dict copy:
44.3 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
sortedmap copy:
303 µs ± 2.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
SortedDict fromiter:
309 µs ± 950 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
642 µs ± 9.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sortedmap fromiter:
307 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
180 µs ± 805 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Also I am not sure why the fromiter over the float values in case of SortedDict is considerably slower than with the string version?

I am working on a piece of code where I am chasing milliseconds, and was wondering if there was anythingh I could do to make it faster?

Thanks

atellier avatar Mar 28 '21 19:03 atellier

The existing copy() code just calls init() and could be made a bit faster by inlining the call and copying internal data structures. Try inheriting and overriding the copy() method to experiment.

Also, I looked at the sortedmap copy() method and it looks shallow wrt the underlying c++ map data structure. Sorted Containers may be doing a deeper copy.

grantjenks avatar Apr 07 '21 16:04 grantjenks