python-sortedcontainers
python-sortedcontainers copied to clipboard
Slow SortedDict.copy() and SortedDict.values()
Hi,
I am looking at using a sorted dict implementation and I have noticed that the copy() function of SortedDict was somewhat "slow", especially compared with other implementations.
Here's the code to test (in Jupyter because of the use of %timeit):
from sortedcontainers import SortedDict
from sortedmap import sortedmap
import string
import random
import numpy as np
letters = string.ascii_uppercase + string.digits
positions = SortedDict({key: 0.0 for key in [''.join(random.choice(letters) for i in range(12)) for _ in range(5000)]})
positions_dict = dict(positions)
print("SortedDict copy:")
%timeit positions.copy()
print("init with SortedDict:")
%timeit SortedDict(positions)
print("init with dict:")
%timeit SortedDict(positions_dict)
print("dict copy:")
%timeit positions_dict.copy()
print("sortedmap copy:")
positions_map = sortedmap(positions)
%timeit positions_map.copy()
print("SortedDict fromiter:")
%timeit np.fromiter(positions.keys(), dtype="U12", count=len(positions_map))
%timeit np.fromiter(positions.values(), dtype="float", count=len(positions_map))
print("sortedmap fromiter:")
%timeit np.fromiter(positions_map.keys(), dtype="U12", count=len(positions))
%timeit np.fromiter(positions_map.values(), dtype="float", count=len(positions))
and here are the results:
SortedDict copy:
1.09 ms ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
init with SortedDict:
762 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
init with dict:
258 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
dict copy:
44.3 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
sortedmap copy:
303 µs ± 2.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
SortedDict fromiter:
309 µs ± 950 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
642 µs ± 9.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sortedmap fromiter:
307 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
180 µs ± 805 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Also I am not sure why the fromiter over the float values in case of SortedDict is considerably slower than with the string version?
I am working on a piece of code where I am chasing milliseconds, and was wondering if there was anythingh I could do to make it faster?
Thanks
The existing copy() code just calls init() and could be made a bit faster by inlining the call and copying internal data structures. Try inheriting and overriding the copy() method to experiment.
Also, I looked at the sortedmap copy() method and it looks shallow wrt the underlying c++ map data structure. Sorted Containers may be doing a deeper copy.