ext-ds
ext-ds copied to clipboard
Modification during iteration
It's currently possible to modify / update a collection during iteration. This leads to undefined behaviour, because the internal iterator is not aware of these modifications. There are three possible solutions here:
- Keep a modcount, throw exception during iterator validation if modcount is different.
- Keep an iterator count on the structure, and throw an exception during updates if > 0.
- Leave as is, it's the user's responsibility to be sensible.
Could also just raise a warning instead of throwing an exception.
@krakjoe, curious to hear your thoughts. :bulb:
I'm leaning towards 1, as a modcount can also be used to make other shortcuts, such as having sort reset the modcount to 0, and have modcount == 0 imply a sorted collection. (Just an idea)
The overhead is similar between 1 and 2, where in the former you increment during updates and compare during validation, and in the latter you increment during iterator init but compare during updates.
However, shouldn't sorting also be forbidden while iterating? If you mix modcount usage for both, this will not be properly enforced, right?
Sorting wouldn't be allowed for obvious reasons. You could therefore add an element, then sort, and the iterator wouldn't know about it. So it's a flawed idea, unless sort uses a 'last sorted modcount' rather than updating the modcount.
That was just an example though, a potential bonus.
I think modifying values during iteration should be fine. The only concern here is modifying the actual collection- i.e., changing the size of the collection.
I feel like it would be fairly straightforward to template behaviour off Java in this instance. https://docs.oracle.com/javase/7/docs/api/java/util/ConcurrentModificationException.html
Going with the "please be sensible" option here, it's on the user for now until this causes a problem that can't be avoided or is a big risk. Maybe 2.0 can see a modification guard that is incremented and decremented by the iterator, and all functions that modify should check the guard first.
I think this does need to be addressed with explicit exceptions. I'm currently in the process of migrating to Map from array on a project, and I ran into this unexpected behaviour on investigating a bug:
$m = new \Ds\Map();
for($i = 0; $i < 100000; ++$i){
$m[$i] = $i;
}
foreach($m as $k => $v){
unset($m[$k]);
}
var_dump(count($m));
outputs int(50000) instead of int(0). I did not realize this code wasn't supported until finding this issue.
I agree and will work to implement this asap. :+1:
I'm sure I replied to your question outside of github ... but in case I didn't ... 1) ...
@krakjoe I was actually thinking 2. We can assume that there will be more modifications than iterator instances, so was thinking an if (UNEXPECTED(obj->iteratorCount)) { throw } on update could be faster than obj->modcount++; which would be checked during each step of the iteration? (Not that that is reason enough / significant).
Modcount:
- Increment on update
- Set starting modcount on iterator ctor
- Check if modcount != starting modcount in
valid?
Iterator count:
- Check if iterator count > 0 on update
- Increment iterator count on iterator init
- Decrement iterator count on iterator dtor
There's also the possibility of modcount overflowing, though it would always be non-zero in that case, right?
Completely disallowing modifications during iteration sounds more desirable, because it'll make it easier to find where a bug is coming from.
If one part of user code adds a new hashtable key during iteration (without realizing it wasn't just overwriting) and the code only crashes later on when the next iterator next occurs, this is going to create some very hard-to-find bugs. The user will know an illegal modification occurred, but not where it occurred.
With that said, it's also possible to just disallow size-changing modifications if iteratorCount > 0 I guess. I strongly feel that any exception should be thrown at the point of attempted modification though.
I agree with that @dktapps
So long as this is true:
I strongly feel that any exception should be thrown at the point of attempted modification though.
It's all good ...
I have a draft implementation here: https://github.com/php-ds/ext-ds/commit/76eec5336efc35cbbe4880f3f537ba11db19bf74
This follows @dktapps suggestion of size-changing modifications, which makes sense to me. I don't think we should disallow all modifications - only those that would corrupt the iterator.
The list I came up with and applied is: Sequence:
- push
- pop
- shift
- unshift
- clear
Map:
- put
- putAll
- clear
Set:
- add
- remove
- clear
Stack/Queue
- push
- pop
- clear
This allows us to still do stuff like:
foreach ($sequence as $index => $value) {
$sequence[$index - 1] = $value;
}
I will leave this open for discussion, but will clean up and release at some stage if no changes are suggested. I will need to update the tests and the polyfill also, but those are easy to do.
This will be implemented in 2.0 using copy-on-write for internal buffers.
If I understand you correctly @rtheunissen that means the iteration behaviour will be consistent with arrays (iterating on a copy, modification of original permitted), did I understand that correctly?
That is correct.
@dktapps it also means that a clone does not clone the data until the cloned object is written to. It creates a new instance which shares the same data, but is separated from other containers when modified.
Still valid for 1.x
is it safe to use IteratorIterator($collection) while modifying the $collection?
as i know, ArrayIterator can do this, why Collection cannot?
doesn't IteratorIterator create its own iterator or is it using the collection's one?
e.g. if i want:
$iterator = new IteratorIterator($map);
$iterator->rewind();
$iterator->current();
$iterator->next();
$iterator->current();
$map->put($key, $value);
$iterator->next();
$iterator->current();
my keys implement Hashable, so if Collection cannot do this, i'll have to use ArrayIterator and instead of $map[$key] = $value; i will use $arrayIterator[$key->hash()] = $value; directly.
I just don't get why Collection cannot simply do what ArrayIterator can
ArrayIterator can do this, why Collection cannot?
This is because arrays are copy-on-write/persistent and ext-ds structures do not support this yet. When you iterate an array it increases its reference count, so that when you modify the array it actually creates a copy of the entire array and the modification is done on the copy. The iteration will still continue on the previous version of the array.
doesn't IteratorIterator create its own iterator or is it using the collection's one?
I think it behaves a bit like this:
https://3v4l.org/G3cI7
<?php
// Collection
$collection = [1, 2, 3];
// IteratorIterator
$generator = function (iterable $i) {
yield from $i;
};
// new IteratorIterator
$iterator = $generator($collection);
var_dump($iterator->current());
my keys implement Hashable, so if Collection cannot do this, i'll have to use ArrayIterator and instead of $map[$key] = $value; i will use $arrayIterator[$key->hash()] = $value; directly.
I would strongly suggest against this pattern. The hash is not the key. Multiple keys that are not equal at all can share the same hash.
I just don't get why Collection cannot simply do what ArrayIterator can
Because arrays are copy-on-write/persistent. ext-ds 2.0 will focus on persistence, but there are currently no plans to backport that to the 1.x branch.
If you must modify a collection that may be iterated somewhere, you should clone it before the iteration. However, creating a full copy every time is a major waste of resources so this should only be seen as a temporary solution while 2.x is being developed.
my keys implement Hashable, so if Collection cannot do this, i'll have to use ArrayIterator and instead of $map[$key] = $value; i will use $arrayIterator[$key->hash()] = $value; directly.
I would strongly suggest against this pattern. The
hashis not the key. Multiple keys that are not equal at all can share the same hash.
idk if i'm doing this right but.. according to ddd, i have an Entity and an EntityId classes. something like this:
class Entity { private EntityId $entityId; }
class EntityId { private int $integerId; }
in this case, $entity->entityId->integerId is both the hash and the unique identifier (unique amongst Entity instances).
I understand that "hash is not the key" but what is wrong with "the key is a hash"?
Or did you mean I should correct it in such a way: $arrayIterator[(string) $key->integerId] = $value?
@rtheunissen thank you for your replies, it was interesting and informative to read them 👍
ArrayIterator can do this, why Collection cannot? This is because arrays are copy-on-write/persistent and ext-ds structures do not support this yet.
fyi, i've just got to know, that in php 7.3 ArrayIterator has a bug and cannot do this too. e.g.: https://3v4l.org/Ft0Dh
Output for 7.3.0 - 7.3.4
15: valid(): bool(false)
16: current(): NULL
17: offsetSet(5, 5): NULL
18: valid(): bool(true)
19: current(): UNKNOWN:0
Notice: Undefined variable: s in /in/2Aelr on line 5
20: :
after returning an undefined value on current() call, it stops the iteration even though i append new values
@githubeing Please create a report on bugs.php.net. That UNKNOWN:0 is definitely a bug.
@githubeing
I understand that "hash is not the key" but what is wrong with "the key is a hash"?
In your specific case, it seems like the hash is unique so could be used as a key, sure. It is not recommended when you can not guarantee that a hash will be unique.
@githubeing if you use a generator rather than an IteratorIterator and you clone the structure before you iterate it, does that solve your use case? I know that can be an expensive operation (the clone every time), but it may carry you until we implement an improvement around iteration.
@githubeing Please create a report on bugs.php.net. That UNKNOWN:0 is definitely a bug.
i've done: https://bugs.php.net/bug.php?id=77903
you may vote for it if you want btw