ext-ds icon indicating copy to clipboard operation
ext-ds copied to clipboard

Modification during iteration

Open rtheunissen opened this issue 8 years ago • 42 comments

It's currently possible to modify / update a collection during iteration. This leads to undefined behaviour, because the internal iterator is not aware of these modifications. There are three possible solutions here:

  1. Keep a modcount, throw exception during iterator validation if modcount is different.
  2. Keep an iterator count on the structure, and throw an exception during updates if > 0.
  3. Leave as is, it's the user's responsibility to be sensible.

Could also just raise a warning instead of throwing an exception.

@krakjoe, curious to hear your thoughts. :bulb:

rtheunissen avatar Mar 06 '16 04:03 rtheunissen

I'm leaning towards 1, as a modcount can also be used to make other shortcuts, such as having sort reset the modcount to 0, and have modcount == 0 imply a sorted collection. (Just an idea)

The overhead is similar between 1 and 2, where in the former you increment during updates and compare during validation, and in the latter you increment during iterator init but compare during updates.

rtheunissen avatar Mar 06 '16 04:03 rtheunissen

However, shouldn't sorting also be forbidden while iterating? If you mix modcount usage for both, this will not be properly enforced, right?

nikic avatar Mar 06 '16 12:03 nikic

Sorting wouldn't be allowed for obvious reasons. You could therefore add an element, then sort, and the iterator wouldn't know about it. So it's a flawed idea, unless sort uses a 'last sorted modcount' rather than updating the modcount.

That was just an example though, a potential bonus.

rtheunissen avatar Mar 06 '16 20:03 rtheunissen

I think modifying values during iteration should be fine. The only concern here is modifying the actual collection- i.e., changing the size of the collection.

I feel like it would be fairly straightforward to template behaviour off Java in this instance. https://docs.oracle.com/javase/7/docs/api/java/util/ConcurrentModificationException.html

john-wilkinson avatar May 02 '16 13:05 john-wilkinson

Going with the "please be sensible" option here, it's on the user for now until this causes a problem that can't be avoided or is a big risk. Maybe 2.0 can see a modification guard that is incremented and decremented by the iterator, and all functions that modify should check the guard first.

rtheunissen avatar Oct 17 '18 17:10 rtheunissen

I think this does need to be addressed with explicit exceptions. I'm currently in the process of migrating to Map from array on a project, and I ran into this unexpected behaviour on investigating a bug:

$m = new \Ds\Map();
for($i = 0; $i < 100000; ++$i){
	$m[$i] = $i;
}

foreach($m as $k => $v){
	unset($m[$k]);
}
var_dump(count($m));

outputs int(50000) instead of int(0). I did not realize this code wasn't supported until finding this issue.

dktapps avatar Jan 26 '19 14:01 dktapps

I agree and will work to implement this asap. :+1:

rtheunissen avatar Jan 27 '19 07:01 rtheunissen

I'm sure I replied to your question outside of github ... but in case I didn't ... 1) ...

krakjoe avatar Jan 27 '19 07:01 krakjoe

@krakjoe I was actually thinking 2. We can assume that there will be more modifications than iterator instances, so was thinking an if (UNEXPECTED(obj->iteratorCount)) { throw } on update could be faster than obj->modcount++; which would be checked during each step of the iteration? (Not that that is reason enough / significant).

Modcount:

  • Increment on update
  • Set starting modcount on iterator ctor
  • Check if modcount != starting modcount in valid?

Iterator count:

  • Check if iterator count > 0 on update
  • Increment iterator count on iterator init
  • Decrement iterator count on iterator dtor

There's also the possibility of modcount overflowing, though it would always be non-zero in that case, right?

rtheunissen avatar Jan 27 '19 08:01 rtheunissen

Completely disallowing modifications during iteration sounds more desirable, because it'll make it easier to find where a bug is coming from.

If one part of user code adds a new hashtable key during iteration (without realizing it wasn't just overwriting) and the code only crashes later on when the next iterator next occurs, this is going to create some very hard-to-find bugs. The user will know an illegal modification occurred, but not where it occurred.

dktapps avatar Jan 27 '19 08:01 dktapps

With that said, it's also possible to just disallow size-changing modifications if iteratorCount > 0 I guess. I strongly feel that any exception should be thrown at the point of attempted modification though.

dktapps avatar Jan 27 '19 08:01 dktapps

I agree with that @dktapps

rtheunissen avatar Jan 27 '19 09:01 rtheunissen

So long as this is true:

I strongly feel that any exception should be thrown at the point of attempted modification though.

It's all good ...

krakjoe avatar Jan 27 '19 10:01 krakjoe

I have a draft implementation here: https://github.com/php-ds/ext-ds/commit/76eec5336efc35cbbe4880f3f537ba11db19bf74

This follows @dktapps suggestion of size-changing modifications, which makes sense to me. I don't think we should disallow all modifications - only those that would corrupt the iterator.

The list I came up with and applied is: Sequence:

  • push
  • pop
  • shift
  • unshift
  • clear

Map:

  • put
  • putAll
  • clear

Set:

  • add
  • remove
  • clear

Stack/Queue

  • push
  • pop
  • clear

This allows us to still do stuff like:

foreach ($sequence as $index => $value) {
    $sequence[$index - 1] = $value;
}

rtheunissen avatar Jan 30 '19 01:01 rtheunissen

I will leave this open for discussion, but will clean up and release at some stage if no changes are suggested. I will need to update the tests and the polyfill also, but those are easy to do.

rtheunissen avatar Jan 30 '19 01:01 rtheunissen

This will be implemented in 2.0 using copy-on-write for internal buffers.

rtheunissen avatar Mar 13 '19 22:03 rtheunissen

If I understand you correctly @rtheunissen that means the iteration behaviour will be consistent with arrays (iterating on a copy, modification of original permitted), did I understand that correctly?

dktapps avatar Mar 14 '19 08:03 dktapps

That is correct.

rtheunissen avatar Mar 14 '19 13:03 rtheunissen

@dktapps it also means that a clone does not clone the data until the cloned object is written to. It creates a new instance which shares the same data, but is separated from other containers when modified.

rtheunissen avatar Mar 14 '19 14:03 rtheunissen

Still valid for 1.x

rtheunissen avatar Mar 14 '19 14:03 rtheunissen

is it safe to use IteratorIterator($collection) while modifying the $collection?

as i know, ArrayIterator can do this, why Collection cannot?

doesn't IteratorIterator create its own iterator or is it using the collection's one?

githubeing avatar Apr 14 '19 12:04 githubeing

e.g. if i want:

$iterator = new IteratorIterator($map);
$iterator->rewind();
$iterator->current();
$iterator->next();
$iterator->current();
$map->put($key, $value);
$iterator->next();
$iterator->current();

githubeing avatar Apr 14 '19 12:04 githubeing

my keys implement Hashable, so if Collection cannot do this, i'll have to use ArrayIterator and instead of $map[$key] = $value; i will use $arrayIterator[$key->hash()] = $value; directly.

I just don't get why Collection cannot simply do what ArrayIterator can

githubeing avatar Apr 14 '19 12:04 githubeing

ArrayIterator can do this, why Collection cannot?

This is because arrays are copy-on-write/persistent and ext-ds structures do not support this yet. When you iterate an array it increases its reference count, so that when you modify the array it actually creates a copy of the entire array and the modification is done on the copy. The iteration will still continue on the previous version of the array.

doesn't IteratorIterator create its own iterator or is it using the collection's one?

I think it behaves a bit like this:

https://3v4l.org/G3cI7

<?php

// Collection
$collection = [1, 2, 3];

// IteratorIterator
$generator = function (iterable $i) {
    yield from $i;
};

// new IteratorIterator
$iterator = $generator($collection);

var_dump($iterator->current());

my keys implement Hashable, so if Collection cannot do this, i'll have to use ArrayIterator and instead of $map[$key] = $value; i will use $arrayIterator[$key->hash()] = $value; directly.

I would strongly suggest against this pattern. The hash is not the key. Multiple keys that are not equal at all can share the same hash.

I just don't get why Collection cannot simply do what ArrayIterator can

Because arrays are copy-on-write/persistent. ext-ds 2.0 will focus on persistence, but there are currently no plans to backport that to the 1.x branch.

If you must modify a collection that may be iterated somewhere, you should clone it before the iteration. However, creating a full copy every time is a major waste of resources so this should only be seen as a temporary solution while 2.x is being developed.

rtheunissen avatar Apr 14 '19 15:04 rtheunissen

my keys implement Hashable, so if Collection cannot do this, i'll have to use ArrayIterator and instead of $map[$key] = $value; i will use $arrayIterator[$key->hash()] = $value; directly.

I would strongly suggest against this pattern. The hash is not the key. Multiple keys that are not equal at all can share the same hash.

idk if i'm doing this right but.. according to ddd, i have an Entity and an EntityId classes. something like this:

class Entity { private EntityId $entityId; }
class EntityId { private int $integerId; }

in this case, $entity->entityId->integerId is both the hash and the unique identifier (unique amongst Entity instances). I understand that "hash is not the key" but what is wrong with "the key is a hash"? Or did you mean I should correct it in such a way: $arrayIterator[(string) $key->integerId] = $value?

githubeing avatar Apr 14 '19 17:04 githubeing

@rtheunissen thank you for your replies, it was interesting and informative to read them 👍

githubeing avatar Apr 14 '19 17:04 githubeing

ArrayIterator can do this, why Collection cannot? This is because arrays are copy-on-write/persistent and ext-ds structures do not support this yet.

fyi, i've just got to know, that in php 7.3 ArrayIterator has a bug and cannot do this too. e.g.: https://3v4l.org/Ft0Dh

Output for 7.3.0 - 7.3.4
15:         valid(): bool(false)
16:       current(): NULL
17: offsetSet(5, 5): NULL
18:         valid(): bool(true)
19:       current(): UNKNOWN:0

Notice: Undefined variable: s in /in/2Aelr on line 5
20:                : 

after returning an undefined value on current() call, it stops the iteration even though i append new values

githubeing avatar Apr 15 '19 17:04 githubeing

@githubeing Please create a report on bugs.php.net. That UNKNOWN:0 is definitely a bug.

nikic avatar Apr 15 '19 17:04 nikic

@githubeing

I understand that "hash is not the key" but what is wrong with "the key is a hash"?

In your specific case, it seems like the hash is unique so could be used as a key, sure. It is not recommended when you can not guarantee that a hash will be unique.

@githubeing if you use a generator rather than an IteratorIterator and you clone the structure before you iterate it, does that solve your use case? I know that can be an expensive operation (the clone every time), but it may carry you until we implement an improvement around iteration.

rtheunissen avatar Apr 15 '19 18:04 rtheunissen

@githubeing Please create a report on bugs.php.net. That UNKNOWN:0 is definitely a bug.

i've done: https://bugs.php.net/bug.php?id=77903

you may vote for it if you want btw

githubeing avatar Apr 15 '19 18:04 githubeing