python-sortedcontainers
python-sortedcontainers copied to clipboard
Feature request: SortedDict Range Deletion
Range key deletion feature is currently missing for SortedDict. If I want to delete multiple keys within an range, I have to do them one by one using a for loop which is very inefficient in terms of time complexity. See my example below.
# program that removes the subdict from 2 to 4
d = SortedDict({1: 1, 2: 2, 3: 3, 4: 4, 5: 5})
lower_bound_index = d.bisect_left(2)
upper_bound_index = d.bisect_left(4)
for _ in range(lower_bound_index, upper_bound_index + 1):
d.popitem(lower_bound_index)
This is very inefficient both in terms of usage and in terms of time complexity. Time complexity for this one is O(k log n) where k is the range size. Other languages such as C++ and Java only needs an time complexity of O(k + log n) to accomplish the job with the API mentioned below.
In C++, you have the option of calling erase(iterator first, iterator last) for their map data structure. Similarly, in Java, you can call subMap(first, last).clear() for their TreeMap data structure. Something similar should be available for SortedDict in my opinion.
Here's another hacky way of doing it. Not sure if it is better in terms of performance. But still, an API would be nice.
def play():
sd = SortedDict({1: 1, 2: 2, 3: 3, 4: 4, 5: 5})
low = sd.bisect_left(2)
high = sd.bisect_left(4)
for key in sd._list[low:high + 1]:
super(SortedDict, sd).pop(key)
del sd._list[low:high + 1]
print(sd)
Having a kind of range_delete makes sense to me. There’s a variety of optimizations that could be done inside the data structure too.
Note that today you can use irange instead of bisect to avoid using the position index.
Here's the best I can come up with today (without internal optimizations):
$ python -m IPython
Python 3.9.1 (v3.9.1:1e5d33e9b9, Dec 7 2020, 12:10:52)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import sortedcontainers
In [2]: class SortedDict(sortedcontainers.SortedDict):
...: def range_del(self, minimum=None, maximum=None, inclusive=(True, True)):
...: iterator = self.irange(minimum, maximum, inclusive)
...: keys = list(iterator)
...: for key in keys:
...: del self[key]
...:
In [3]: sd = SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
In [4]: sd
Out[4]: SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
In [5]: sd.range_del('b', 'd')
In [6]: sd
Out[6]: SortedDict({'a': 1, 'e': 5})
This is great in terms of simplicity. But I would still urge you to take a look at the hack implementation I wrote above. The main idea is that slice deletion for SortedList is already internally optimized to O(log n), which is used for keys, so why not just use it as it is and then take care of the dictionary deletion one by one. That way, the time complexity is O(k + log n) instead of O(k log n), which is inline with Sorted Maps in Java and C++ and other languages.
The performance of the two will differ depending on the size of the dictionary and how much is deleted. But I generally agree that an optimized version would be a good addition to the API.
The best improvement to your version would be to avoid building the positional index. If you like at the implementation of SortedList.irange, you can see how to do so.
But how would I achieve slice deletion if I don't have the positional indices? SortedList.__delitem__ only takes index or slice. It is only through slice deletion that we achieve internal optimization.
Hello, I'm currently facing the exact same problem and I was wondering whether any of you (@ThomasShaw1) had found some optimized solution ? I've tried the range_del @grantjenks wrote but it doesn't seem really optimized as it doesn't accelerate the deletion.
Best regards