robin-map
robin-map copied to clipboard
it = map.erase(it) potentially passes entries twice
I've stumbled upon a bug in my robin_hood map which also uses backward shift deleting, and wondered how you solve this issue in tsl::robin_map
, and it seems you have exactly the same issue. (see https://github.com/martinus/robin-hood-hashing/issues/42)
The problem is that due to backward shift deleting, when iterating a map and erasing elements, it is possible that the same elements will be iterated over twice.
Here is a reproducer, where I insert two elements into the map. When erasing the second element, the first element will wrap around and be iterated over a second time. I currently think it is best to just claim that's the way it is, because I am afraid there is no easy workaround (except not using backward shift deleting)
template <typename T>
struct BadHash {
size_t operator()(T const& val) const {
return static_cast<size_t>(val);
}
};
int main(int, char**) {
tsl::robin_map<int, int, BadHash<int>> map;
map.reserve(16); // resizes with mask 31
map.emplace(31, 1); // gets into last bucket
map.emplace(1024 + 31, 2); // would also get into last bucket, but wraps around
// first entry in the map
auto it = map.begin();
std::cout << it->first << "->" << it->second << std::endl;
// second entry in the map, erase it
++it;
std::cout << it->first << "->" << it->second << std::endl;
it = map.erase(it);
// now all two elements are iterated, we should be at the map's end, but due to backward shift
// deletion we are not
if (it != map.end()) {
std::cout << it->first << "->" << it->second << std::endl;
}
it = map.erase(it);
std::cout << "are we now at the end? " << (it == map.end()) << std::endl;
}
I wonder what you think about that issue?
Hello,
Thank you very much for the report. I effectively completely missed that when implementing the backward shift deletion.
I started the implementation of a fix in the fix_issue_20 branch. I still have to check a bit if everything works fine and improve the documentation before merging the changes.
Basically I check if I wrap around during the backward shift deletion and if it's the case i just return end()
.
Thibaut
I think simply checking for a wrap around is not enough. E.g when you insert 3 elements at location 30, 2 elements are at the end and one will wrap around. When erasing the second to last, backward shift will wrap around but there is still one element left to be iterated
Effectively I went a bit too fast with the proposed solution, it will not work in multiple cases. I'll check this evening to see how we can do that.
Hi,
I checked a bit and currently I see the following possible solutions:
- Keeping the same behaviour as
std::unordered_map
.- Don't wrap around, we just add a buffer at the end of the buckets (that is not counted in the
bucket_count
) to store the elements that would have wrapped around. We rehash the hash table if the buffer gets full regardless of the current load factor. The problem is the risk of memory DoS attacks as an attacker may force the hash table to rehash by inserting elements at the end. - When creating an iterator, add a reference to the last current element of the hash table except on erase where we propagate the reference to the returned iterator. If on erase we see that the element of the iterator is the same as the reference to the last element in the iterator we return
end()
after erasing it instead of the next element. - When calling
begin()
, return the first element that was not wrapped around. When incrementing the iterator, wrap around and continue until the last wrapped element
- Don't wrap around, we just add a buffer at the end of the buckets (that is not counted in the
- Different behaviour compare to
std::unordered_map
.- Leave it as it is and document the difference. When using erase in a loop it's usually to erase an element that meets a condition (e.g.
if(it->first % 2 == 0) it = erase(it);
) and not erase everything that come after an element. - Just don't return anything, we just change the return type to
void
. Not really viable as erasing while iterating can be quite useful.
- Leave it as it is and document the difference. When using erase in a loop it's usually to erase an element that meets a condition (e.g.
The problem is more complex than expected and I have to think a bit if I can find another solution or which solution to pick-up.
Thibaut
I also thought about adding an overflow buffer at the end. This might have a slight performance advantage because less need to & (tablesize-1)
. The DoS attack could also be mitigated with a hash that e.g. uses the this
pointer, then the positions become less predictable.
Your iterator idea brings me to a new idea: How about instead of a reference, add a counter of the number of remaining items. begin()
sets the counter to size()
, and end()
sets the counter to 0. Each ++it
or it = erase(it)
decrease the counter, comparing iterators just need to compare the counter. That needs no specil wrap-around handling, and begin() and end() stay fast operations
Adding a counter would also be possible but the problem is that you also need to initialize it in other methods than begin()
. For example on insertion you need to return an iterator to the inserted element, how can you get the remaining size until end()
efficiently?
I also realised that a reference to the last element is not really possible either as it may be invalidated. It also has the same problem as with a counter, how do you efficiently get a reference to the last element in some methods? You could eventually maintain a pointer to the last element in the class and update it on each insert/erase. But it's getting quite complex.