PHP 7 - Wrong Vector/Map/Set equality
I think creating a different issue from #65 is a better idea.
Current problem
var_dump(new \Ds\Set([1, 3]) == new \Ds\Set([1, 2, 3])); // true
it makes me very sad as I definitively can't rely on Ds in my tests. And I use Ds everywhere as it does a really nice job (thanks for that anyway)
Solution
Is there any reason why this behavior ? And is it a bug or a known behavior ? Looks too big that anyone complain about it ^^
Whatever I'm definitively open to contribute event if my C skill are very old because it cannot stay like that.
I already had a look on the (clean) code.
I guess we need to implements the compare_objects handlers on the internal htable ? Looks pretty obvious, I know. May be we can even use the toArray and rely on array::compare_objects ?
Is there any stuff that prevent us to do that ?
Anyway, let me know what I can do to help on this very annoying issue.
Thanks
@tyx we would need to implement compare_objects on the zend objects, ie. php_ds_set etc. The internal htable would do the actual work though, eg.:
compare_objectshandler- Check if the object is an instance of Set
- Access the internal set
ds_set_equals(ds_set_t *a, ds_set_t* b)ds_htable_equals(ds_htable_t *a, ds_htable_t *b)
I'm guessing we'd iterate through both the table's keys and values and check if they're equal? (Or in the case of a Set, only the keys as all the values would be undefined).
toArray would be a performance hit, might also run into non-scalar keys.
Sorry about this annoying issue - keen to get it implemented and tested though.
There's also the obvious win if the object being compared to is the same instance. ⚡️
Looks nice ! Thanks you for your fast answer.
toArray would be a performance hit, might also run into non-scalar keys.
Indeed !
Sorry about this annoying issue - keen to get it implemented and tested though.
Compared to the benefits we get to use your extension, it's not a big deal in fact ;)
There's also the obvious win if the object being compared to is the same instance.
Yes ! This one works actually
@nikic would this actually be implicit if we implement the get_properties handler?
@rtheunissen Yes-ish. I'd assume you wouldn't want to have the same comparison semantics as a property comparison would give you. In particular, those would be weak and I doubt you'd want to have Set(["foo"]) == Set([true]), right?
The only issue here is that we can't implement this in the polyfill. :no_good_man:
Only option would be to implement Hashable and use equals for equality checks. Would never need > and < anyway.
@rtheunissen ok so the main issue is equality so it's not a big issue about < and >. And yes why not with Hashable, we're ok that Set, Vector, ... don't implement it ATM? This is what you've planned to solve the issue.
@shouze happy to do it with equals. Added to 2.0, see #65
This is actually more difficult than I expected. equals as part of the Hashable interface sort of scopes that equality check to a context in which the object is used as a key in a hash table. I'm not sure if that also implies general equality, ie. a case where you'd check $a === $b.
For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?
Or should we create an Equality interface and have Hashable just extend that?
First of all, to me identity comparison (===) is something else and not part of the scope of this issue.
For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?
Yes why not.
Or should we create an Equality interface and have
Hashablejust extend that?
And yes why not, I'm not sure that it's needed but could be a nice to have.
In fact I wonder about the way you want to proceed:
- do you want to lookup each entries of eg a
Collectionwhen we want to check that a collection is equal to another? - Or do you plan to maintain an always up2date internal hash, updated each time we
mutatethe collection?
Do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?
Yes I think so. Something similar to what lodash does with _.equals. We can fail fast with things like type and size comparisons. An interesting case would be a Map, which may not have the same ordering, but could be considered "equal". {a: 1, b: 2} == {b: 2, a: 1} for example. We would have to define this for each collection.
Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?
No I don't think this is practical, better to just iterate through to compare.
The issue remains that we can't implement a == check in the polyfill.
Whenever you would use $set == $another, you would have to use $set->equals($another).
@rtheunissen yes ok for $set->equals($another) if it can help to maintain the polyfill at the same level of features alongside.
I'd like to keep them consistent, otherwise it invalidates the polyfill to some extent. Maybe if and when the extension becomes a default, we can drop the polyfill.
Yes, can be a scheduled deprecation but you're right it's probably not the time to do that as it would break adoption.
This change is actually quite complex. Whenever values are compared (filter, sort, find, etc), we now have to check if the value implements Hashable, and use its Hashable::equals if it does. Of course it's easy to write a helper like valuesAreEqual that checks this for us, but the effects are clearly outside of the Hashable domain. I'm therefore convinced that we should move the equals method of Hashable to its own interface, maybe Equality, with only a single method equals.
It's a shame that PHP still doesn't have a Comparable interface or operator overloading. I'm tempted to wait for the Comparable RFC to pass and be released, which would make the Equality interface redundant.
You can use something like $a->diff($b)->isEmpty() but that wouldn't currently call equals on any objects in the set that implement Hashable. If you know you only have scalar values in your set, you can wrap it and implement your own equals for now.
We're are considering to replace our custom collection structures by DS, but we use collection comparisons heavily.
Java defines specific strategies for computing hash codes for each type of collection. For maps, for instance, the hash code is order insensitive (hash += hash(key) ^ hash(value)), so that the equality relation is valid across all map implementations.
Our collection library has an Equality interface, from which Hashable extends from. It makes sense for us since the concept of hash only makes sense tied to an equality relation. Defining a concept of equality relation between collections also requires defining how the hash codes should be computed consistently as per collection type (it should be part of the interface contract). In Java, for instance, two maps are equal if both have the same EntrySet, while two sets are equal if both have the same elements. The rule for determining if two objects are equals is as follows:
- Two identical objects are always equal
- Two Hashable objects
$aand$b, are equal if$a->equals($b)and$b->equals($a) - Everything else is unequal
All methods should follow these rules, including diff, intersect, remove, etc.
Despite the issues related to the implementation of operation comparison overloading, it would be really helpful to have an equals method that allows us testing collections for equality. It is not a BC break and does not require complex changes as implementing operation overloading does.
About the Comparable RFC, it's an old discussion and I don't believe it will be added to the core anytime soon.
It's is not a BC break and does not require complex changes as implementing operation overloading does.
That's true, but changing the behaviour of diff, intersect, remove, etc could be considered significant enough to add as part of 2.0 (end 2017).
I agree that the collections should implement Hashable, and I think following the direction you've laid out makes sense. No need for the overloaded op just yet, that can come later.
@rtheunissen thanks for considering it! Indeed, changing the behavior of these methods does represent a BC break, but introducing the equals method does not. Any chance of having it before the 2.0 release? Looking the issues, it is probably the most requested feature.
So we're implementing an equals method, but not Hashable yet?
Why not Hashable? We can just leave diff, intersect, remove untouched until 2.0. For now, it should be possible to implement equals and hash without introducing a BC break, right?
But then Map and Set will try to use them as Hashable's, and not spl_object_hash, which could break things. We can add an equals method that defines equality, but implementing Hashable is a 2.0 addition.
Right, I missed it. Yes, equals will help a lot for now. Just to be sure, how do you plan to implement this algorithm? Performance is always a problem in collection comparisons. What I've in mind is something like the following algorithm, suitable for the polyfill, but obviously written in C:
public function equals($other): bool
{
if (!($other instanceof Map)){
return false;
}
if ($this->count() !== $other->count()) {
return false;
}
foreach ($this->map as $key => $value) {
if (!$other->hasKey($key)) {
return false;
}
/**
* isEqual()
* - Two identical objects are always equal
* - Two Hashable objects $a and $b, are equal if $a->equals($b) and $b->equals($a)
* - Everything else is unequal
*/
if (!$this->isEqual($value, $other->get($key))) {
return false;
}
}
return true;
}
So {a: 1, b:2} === {b:2, a: 1} ? I'm considering making the comparison order-sensitive, because it's an ordered collection, unlike a Java Map (excl ordered map).
If ordering is considered, we can do a simple O(n) comparison from 0 to n - 1. If ordering is not considered, then your example is spot on, O(n log n).
From Java docs:
Compares the specified object with this map for equality. Returns true if the given object is also a map and the two maps represent the same mappings. More formally, two maps m1 and m2 represent the same mappings if m1.entrySet().equals(m2.entrySet()). This ensures that the equals method works properly across different implementations of the Map interface.
So, in Java ordered and unordered collections and considered equal.
However, I like your idea. We could further improve the collection by introducing a new methodcontainsAll(Map $map), so that $left->count() === $right->count() && $left->containsAll($right) is the same of checking for order insensitive equality. In that way we provide an alternative for performing both checks.
What if we added a bool arg to xor, diff and intersect to also match on the value. That way, to determine if a Map is equal you can check if count($map1->xor($map2, true)) == 0 ? I think that works?
Can always have a bool arg in equals, which won't break the Hashable contract. bool $ordered = false? I really don't like boolean arguments though because their function is not always clear from the caller's end.
I think we should keep it simple here. Check all pairs and make the same ordering a requirement for equality. If you have a potentially different order, sort first then check check equality.
I think it can quickly become unmaintainable. So, the behavior of these flags set to false is the same as not implementing the Hashable interface. In other words, if an object implements the Hashable interface it is explicitly defining the standard way for comparing two instances of that type. I think that we should enforce adherence to the protocol. Furthermore, if one wants this behavior and there is no big perfomance gain between the native array and the map, there is no reason for using the map at all.
Java's argument for not considering ordering a part of the criteria for equality is based on the fact that their collections aren't ordered in the first place (bar a few). What's interesting to note is that Java doesn't consider ordering even when using collections like LinkedMap. See http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/LinkedHashMap.java