dlang.org icon indicating copy to clipboard operation
dlang.org copied to clipboard

Fix Issue 16659 - Clarify mutating while iterating rules

Open wilzbach opened this issue 8 years ago • 23 comments

I just went with the current behavior - ideas for improvement are very welcome.

Btw ideally all these examples would get verified automatically and are runnable online ...

wilzbach avatar Jun 15 '17 06:06 wilzbach

Fix Bugzilla Description
16659 Clarify mutating while iterating rules

dlang-bot avatar Jun 15 '17 06:06 dlang-bot

I don't think allowing any mutation is a good idea, it's extremely dependent on implementation details. I think there should be at least a special contract for this, e.g. mutating range as @schveiguy proposed.

d-random-contributor avatar Jun 15 '17 10:06 d-random-contributor

don't think allowing any mutation is a good idea, it's extremely dependent on implementation details.

Pinging our AA experts @MartinNowak @quickfur @rainers @schveiguy @yebblies for opinion.

PetarKirov avatar Jun 15 '17 10:06 PetarKirov

No, you can't add or remove keys during iteration. See here: https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L680

Let's say you add a key, it causes a rehash. Your range is now pointing at an unfilled bucket, breaking the implied invariant of the range (it's either empty or pointing at a valid element).

You can change values of keys, as that does not affect the structure of the AA.

schveiguy avatar Jun 15 '17 10:06 schveiguy

I should add, the text says "value of key", just like I did above, but first time I read this, I thought it meant you can change the key itself while it's still in the AA. It should be reworded.

schveiguy avatar Jun 15 '17 11:06 schveiguy

as that does not affect the structure of the AA

I had an idea that hashtable DoS attack can be prevented by counting hash misses: when the number reaches some threshold value, the AA is rehashed with new hash seed. In this implementation setter can rehash the AA.

d-random-contributor avatar Jun 15 '17 11:06 d-random-contributor

In this implementation setter can rehash the AA.

This is dangerous. Lookups should not rehash.

auto v1 = key1 in aa;
auto v2 = key2 in aa; // invalidates v1?

schveiguy avatar Jun 15 '17 11:06 schveiguy

Actually in current implementation, if you are pointing at an element, rehashing doesn't affect that. But the range isn't pointing at an element, it's pointing at the impl, and stores an idx of where it is. I still don't like the idea of rehashing for hash misses, at least not in the default AA implementation. DoS attacks are only going to happen in certain circumstances, you can use a specialized type for that.

schveiguy avatar Jun 15 '17 11:06 schveiguy

invalidates v1?

Probably shouldn't. Is it documented?

d-random-contributor avatar Jun 15 '17 11:06 d-random-contributor

The exact behavior is not documented. In general, I would say anything that causes a rehash would certainly invalidate the range. Whether it would invalidate a reference to a value is up for debate. As I said, currently it does not. Changing that aspect might be very bad for some people.

I would also say that changing the value mapped to an existing key should be documented not to cause a rehash. Neither should looking for an existing key in the hashmap. Too much code depends on this kind of thing.

schveiguy avatar Jun 15 '17 12:06 schveiguy

Apart from random results of the iteration after insert or remove, the current range implementation is not perfectly safe with respect to memory corruption. After rehashing as part of aa.remove, the idx variable might point beyond the end of the buckets (resulting in bad empty-state and RangeErrors most of the time). It might also point to an empty bucket, and _aaRangeFrontValue will return an invalid non-null pointer.

rainers avatar Jun 15 '17 12:06 rainers

_aaRangeFrontValue will return an invalid non-null pointer

Haha, yes, that is probably something we could fix easily, as the key is a null pointer (can check for that before adding the offset). Note that in most cases, accessing the invalid pointer should still be a segfault (which is somewhat safe).

schveiguy avatar Jun 15 '17 12:06 schveiguy

_aaRangeFrontValue will return an invalid non-null pointer

Hmm... also noting that the front call in object.d is marked @trusted. That should be fixed, I'll file an issue.

Update: https://issues.dlang.org/show_bug.cgi?id=17507

schveiguy avatar Jun 15 '17 12:06 schveiguy

Haha, yes, that is probably something we could fix easily,

I'm not sure a null pointer is a reasonable result. Is it ok (safe?) to have an iterator randomly continue, crash with null pointer access or produce a range error?

rainers avatar Jun 15 '17 13:06 rainers

The key parameter should not be passed to foreach loops by ref, so your hopefully only mutating your copy of it.

MartinNowak avatar Jun 15 '17 14:06 MartinNowak

Yes, deletion can trigger a shrink of the table and rehashing of all remaining keys, which does invalidate the iteration index.

I'm not sure a null pointer is a reasonable result. Is it ok (safe?) to have an iterator randomly continue, crash with null pointer access or produce a range error?

Yes, neither continuing iteration with a random element, nor deterministically crashing with null pointer, nor exiting with a RangeError do corrupt memory, so those would be @safe options. Not sure that the current implementation does any of those though, but we should make it somehow @safe and guarantee that.

MartinNowak avatar Jun 15 '17 14:06 MartinNowak

The key parameter should not be passed to foreach loops by ref, so your hopefully only mutating your copy of it.

I'm guessing you misunderstood "value of key" like I did. He means you can change the value associated with the key, not the key itself.

schveiguy avatar Jun 15 '17 15:06 schveiguy

This comes up every so often, the idea of mutating a container while iterating over it. The fact of the matter is, this will almost always lead to behaviour that most people would agree is strange, counter-intuitive, or otherwise pathological, unless (1) you only ever delete the current element during the iteration, and (2) the container was written explicitly to allow for this case. This applies to containers in general, but especially to unordered ones like AA's.

Consider an arbitrary AA (not necessary the current D implementation) that you're trying to iterate over. Suppose the iteration algorithm you're using produces the sequence of keys k1, k2, k3, ... kn. Let's say you're currently on k3, and then your code decides to insert a new key kx. The question is: should this new key be visible on this current round of iteration, or should it only be visible the next time you iterate the AA?

No matter which way you answer, there is always a case that will break your expectations, because AA's being unordered means that kx could end up before k3 or after k3, and you can't predict this beforehand (without going under the hood, of course). Furthermore, the act of inserting kx may trigger a rehash of the AA, thereby reordering k1...kn. Potentially, it could reorder it such that the iteration sequence now becomes k4, k2, k3, k1, ... kx, kn. So now, your overall sequence becomes k1, k2, k3, k1, ... kx, kn, because you already iterated over k1 and k2 before reaching k3, but due to the rehash, moving forward from k3 brings you back to k1, and then diverges into a different iteration sequence than it would have gone before. (And note that k4 now will never be visited on this round of iteration, because it had the bad luck of being rehashed before your current element.)

This is pathological behaviour, because when you write foreach (k, v; aa), the expectation is that you'll only visit each element once, in some arbitrary order. But modifying the AA in the interim may break this expectation: you may end up visiting some elements multiple times, and some elements may be skipped over, and the new element may or may not be visited in this round of iteration.

The same argument goes for deleting elements from the AA while iterating over it.

You could say, what if I create a list of keys from the AA first, then iterate over that instead? You could -- but then if you add new elements, they will not be visible until the next round of iteration, and if you delete elements, your list of keys may contain invalid keys. And if you delete and insert keys, you may end up with new keys that will appear in the current iteration because it coincides with the deleted key. This may or may not be your intention. Not to mention the inefficiency of creating a list of keys prior to iteration.

Basically, modifying a container while iterating over it is a Bad Idea. Unless you absolutely know what you're doing and want the exact resulting semantics, I'd highly recommend not doing this. The only exception, as I said, is when you only ever delete an element in the current position, and the container explicitly supports this operation (and guarantees that iteration order won't suddenly randomize on you, or suddenly stop because the current node's next pointer is now null).

quickfur avatar Jun 15 '17 19:06 quickfur

Because of this, I recommend not explicating what happens with mutating a container while iterating over it -- rather, the user should be discouraged from doing this. The exact behaviour will depend on the implementation details of the container, so putting this in the docs and having user code rely on the current behaviour will limit the ability to change the container's implementation later. This is leaky abstraction, it's a bad idea.

quickfur avatar Jun 15 '17 19:06 quickfur

@quickfur that's all good, but modifying a value that's in an AA is expected to do nothing to the structure. It's perfectly acceptable, and I would say many people rely on it. Inserting/removing keys is what is bad. We should define and document this and call it a day.

schveiguy avatar Jun 15 '17 21:06 schveiguy

Changing the value in an AA is perfectly fine, obviously.

What's not OK is changing the key. The proposed change in this PR "officially" sanctions changing the current key, mentioning no worse side effect than "the key may not be part of the current iteration". This is an understatement; adding or deleting keys while iterating over an AA can potentially completely screw up the current iteration in the sense I described above, that you may start seeing keys you've already traversed or skipping over keys that you haven't seen yet in the current iteration.

Furthermore, the current behaviour is arguably specific to the current implementation; user code really should not be relying on such behaviour. I'd even go as far as saying that adding/deleting keys from the AA while iterating over it may result in UB. At least, it will result in an undefined subsequent sequence of keys traversed by the current iteration.

quickfur avatar Jun 15 '17 21:06 quickfur

Also, note that mutating the key in an AA will cause bugs, because the hash value will not be updated, so the AA becomes internally inconsistent, and may start accepting duplicate keys and not "seeing" an existing key.

quickfur avatar Jun 15 '17 21:06 quickfur

I don't agree that the docs should explicitly legalize mutating the AA while iterating over it.

@ all thanks a lot for your very active discussion and feedback. I went with the initial wording, because that's how the bug report described the "problem"

Anyhow I have tried to incorporate your feedback into a rewording of this clarification. Please feel free to change, improve and/or nitpick!

wilzbach avatar Jun 15 '17 22:06 wilzbach