ext-ds icon indicating copy to clipboard operation
ext-ds copied to clipboard

PHP 7 - Wrong Vector/Map/Set equality

Open tyx opened this issue 9 years ago • 66 comments

I think creating a different issue from #65 is a better idea.

Current problem

var_dump(new \Ds\Set([1, 3]) == new \Ds\Set([1, 2, 3])); // true

it makes me very sad as I definitively can't rely on Ds in my tests. And I use Ds everywhere as it does a really nice job (thanks for that anyway)

Solution

Is there any reason why this behavior ? And is it a bug or a known behavior ? Looks too big that anyone complain about it ^^

Whatever I'm definitively open to contribute event if my C skill are very old because it cannot stay like that.

I already had a look on the (clean) code. I guess we need to implements the compare_objects handlers on the internal htable ? Looks pretty obvious, I know. May be we can even use the toArray and rely on array::compare_objects ? Is there any stuff that prevent us to do that ?

Anyway, let me know what I can do to help on this very annoying issue.

Thanks

tyx avatar Nov 15 '16 17:11 tyx

@tyx we would need to implement compare_objects on the zend objects, ie. php_ds_set etc. The internal htable would do the actual work though, eg.:

  • compare_objects handler
  • Check if the object is an instance of Set
  • Access the internal set
  • ds_set_equals(ds_set_t *a, ds_set_t* b)
  • ds_htable_equals(ds_htable_t *a, ds_htable_t *b)

I'm guessing we'd iterate through both the table's keys and values and check if they're equal? (Or in the case of a Set, only the keys as all the values would be undefined).

toArray would be a performance hit, might also run into non-scalar keys.

Sorry about this annoying issue - keen to get it implemented and tested though.

There's also the obvious win if the object being compared to is the same instance. ⚡️

rtheunissen avatar Nov 16 '16 00:11 rtheunissen

Looks nice ! Thanks you for your fast answer.

toArray would be a performance hit, might also run into non-scalar keys.

Indeed !

Sorry about this annoying issue - keen to get it implemented and tested though.

Compared to the benefits we get to use your extension, it's not a big deal in fact ;)

There's also the obvious win if the object being compared to is the same instance.

Yes ! This one works actually

tyx avatar Nov 16 '16 06:11 tyx

@nikic would this actually be implicit if we implement the get_properties handler?

rtheunissen avatar Nov 16 '16 20:11 rtheunissen

@rtheunissen Yes-ish. I'd assume you wouldn't want to have the same comparison semantics as a property comparison would give you. In particular, those would be weak and I doubt you'd want to have Set(["foo"]) == Set([true]), right?

nikic avatar Nov 16 '16 20:11 nikic

The only issue here is that we can't implement this in the polyfill. :no_good_man:

rtheunissen avatar Nov 28 '16 08:11 rtheunissen

Only option would be to implement Hashable and use equals for equality checks. Would never need > and < anyway.

rtheunissen avatar Nov 28 '16 08:11 rtheunissen

@rtheunissen ok so the main issue is equality so it's not a big issue about < and >. And yes why not with Hashable, we're ok that Set, Vector, ... don't implement it ATM? This is what you've planned to solve the issue.

shouze avatar Nov 28 '16 08:11 shouze

@shouze happy to do it with equals. Added to 2.0, see #65

rtheunissen avatar Nov 28 '16 08:11 rtheunissen

This is actually more difficult than I expected. equals as part of the Hashable interface sort of scopes that equality check to a context in which the object is used as a key in a hash table. I'm not sure if that also implies general equality, ie. a case where you'd check $a === $b.

For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?

Or should we create an Equality interface and have Hashable just extend that?

rtheunissen avatar Dec 11 '16 20:12 rtheunissen

First of all, to me identity comparison (===) is something else and not part of the scope of this issue.

For example, if we implement equals on Collection, and we're checking each item of the collection for strict equality, do we call equals on values that implement Hashable?

Yes why not.

Or should we create an Equality interface and have Hashable just extend that?

And yes why not, I'm not sure that it's needed but could be a nice to have.

In fact I wonder about the way you want to proceed:

  1. do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?
  2. Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?

shouze avatar Dec 15 '16 18:12 shouze

Do you want to lookup each entries of eg a Collection when we want to check that a collection is equal to another?

Yes I think so. Something similar to what lodash does with _.equals. We can fail fast with things like type and size comparisons. An interesting case would be a Map, which may not have the same ordering, but could be considered "equal". {a: 1, b: 2} == {b: 2, a: 1} for example. We would have to define this for each collection.

Or do you plan to maintain an always up2date internal hash, updated each time we mutate the collection?

No I don't think this is practical, better to just iterate through to compare.

rtheunissen avatar Feb 10 '17 11:02 rtheunissen

The issue remains that we can't implement a == check in the polyfill. Whenever you would use $set == $another, you would have to use $set->equals($another).

rtheunissen avatar Feb 10 '17 11:02 rtheunissen

@rtheunissen yes ok for $set->equals($another) if it can help to maintain the polyfill at the same level of features alongside.

shouze avatar Feb 10 '17 12:02 shouze

I'd like to keep them consistent, otherwise it invalidates the polyfill to some extent. Maybe if and when the extension becomes a default, we can drop the polyfill.

rtheunissen avatar Feb 10 '17 12:02 rtheunissen

Yes, can be a scheduled deprecation but you're right it's probably not the time to do that as it would break adoption.

shouze avatar Feb 10 '17 12:02 shouze

This change is actually quite complex. Whenever values are compared (filter, sort, find, etc), we now have to check if the value implements Hashable, and use its Hashable::equals if it does. Of course it's easy to write a helper like valuesAreEqual that checks this for us, but the effects are clearly outside of the Hashable domain. I'm therefore convinced that we should move the equals method of Hashable to its own interface, maybe Equality, with only a single method equals.

It's a shame that PHP still doesn't have a Comparable interface or operator overloading. I'm tempted to wait for the Comparable RFC to pass and be released, which would make the Equality interface redundant.

You can use something like $a->diff($b)->isEmpty() but that wouldn't currently call equals on any objects in the set that implement Hashable. If you know you only have scalar values in your set, you can wrap it and implement your own equals for now.

rtheunissen avatar Feb 12 '17 01:02 rtheunissen

We're are considering to replace our custom collection structures by DS, but we use collection comparisons heavily.

Java defines specific strategies for computing hash codes for each type of collection. For maps, for instance, the hash code is order insensitive (hash += hash(key) ^ hash(value)), so that the equality relation is valid across all map implementations.

Our collection library has an Equality interface, from which Hashable extends from. It makes sense for us since the concept of hash only makes sense tied to an equality relation. Defining a concept of equality relation between collections also requires defining how the hash codes should be computed consistently as per collection type (it should be part of the interface contract). In Java, for instance, two maps are equal if both have the same EntrySet, while two sets are equal if both have the same elements. The rule for determining if two objects are equals is as follows:

  • Two identical objects are always equal
  • Two Hashable objects $a and $b, are equal if $a->equals($b) and $b->equals($a)
  • Everything else is unequal

All methods should follow these rules, including diff, intersect, remove, etc.

Despite the issues related to the implementation of operation comparison overloading, it would be really helpful to have an equals method that allows us testing collections for equality. It is not a BC break and does not require complex changes as implementing operation overloading does.

About the Comparable RFC, it's an old discussion and I don't believe it will be added to the core anytime soon.

marcospassos avatar Nov 09 '17 19:11 marcospassos

It's is not a BC break and does not require complex changes as implementing operation overloading does.

That's true, but changing the behaviour of diff, intersect, remove, etc could be considered significant enough to add as part of 2.0 (end 2017).

I agree that the collections should implement Hashable, and I think following the direction you've laid out makes sense. No need for the overloaded op just yet, that can come later.

rtheunissen avatar Nov 09 '17 23:11 rtheunissen

@rtheunissen thanks for considering it! Indeed, changing the behavior of these methods does represent a BC break, but introducing the equals method does not. Any chance of having it before the 2.0 release? Looking the issues, it is probably the most requested feature.

marcospassos avatar Nov 10 '17 00:11 marcospassos

So we're implementing an equals method, but not Hashable yet?

rtheunissen avatar Nov 10 '17 00:11 rtheunissen

Why not Hashable? We can just leave diff, intersect, remove untouched until 2.0. For now, it should be possible to implement equals and hash without introducing a BC break, right?

marcospassos avatar Nov 10 '17 00:11 marcospassos

But then Map and Set will try to use them as Hashable's, and not spl_object_hash, which could break things. We can add an equals method that defines equality, but implementing Hashable is a 2.0 addition.

rtheunissen avatar Nov 10 '17 00:11 rtheunissen

Right, I missed it. Yes, equals will help a lot for now. Just to be sure, how do you plan to implement this algorithm? Performance is always a problem in collection comparisons. What I've in mind is something like the following algorithm, suitable for the polyfill, but obviously written in C:

public function equals($other): bool
{
    if (!($other instanceof Map)){
        return false;
    }

    if ($this->count() !== $other->count()) {
        return false;
    }

    foreach ($this->map as $key => $value) {
        if (!$other->hasKey($key)) {
            return false;
        }

        /**
         * isEqual()
         * - Two identical objects are always equal
         * - Two Hashable objects $a and $b, are equal if $a->equals($b) and $b->equals($a)
         * - Everything else is unequal
         */

        if (!$this->isEqual($value, $other->get($key))) {
            return false;
        }
    }

    return true;
}

marcospassos avatar Nov 10 '17 00:11 marcospassos

So {a: 1, b:2} === {b:2, a: 1} ? I'm considering making the comparison order-sensitive, because it's an ordered collection, unlike a Java Map (excl ordered map).

If ordering is considered, we can do a simple O(n) comparison from 0 to n - 1. If ordering is not considered, then your example is spot on, O(n log n).

rtheunissen avatar Nov 10 '17 00:11 rtheunissen

From Java docs:

Compares the specified object with this map for equality. Returns true if the given object is also a map and the two maps represent the same mappings. More formally, two maps m1 and m2 represent the same mappings if m1.entrySet().equals(m2.entrySet()). This ensures that the equals method works properly across different implementations of the Map interface.

So, in Java ordered and unordered collections and considered equal.

However, I like your idea. We could further improve the collection by introducing a new methodcontainsAll(Map $map), so that $left->count() === $right->count() && $left->containsAll($right) is the same of checking for order insensitive equality. In that way we provide an alternative for performing both checks.

marcospassos avatar Nov 10 '17 00:11 marcospassos

What if we added a bool arg to xor, diff and intersect to also match on the value. That way, to determine if a Map is equal you can check if count($map1->xor($map2, true)) == 0 ? I think that works?

rtheunissen avatar Nov 10 '17 00:11 rtheunissen

Can always have a bool arg in equals, which won't break the Hashable contract. bool $ordered = false? I really don't like boolean arguments though because their function is not always clear from the caller's end.

rtheunissen avatar Nov 10 '17 00:11 rtheunissen

I think we should keep it simple here. Check all pairs and make the same ordering a requirement for equality. If you have a potentially different order, sort first then check check equality.

rtheunissen avatar Nov 10 '17 00:11 rtheunissen

I think it can quickly become unmaintainable. So, the behavior of these flags set to false is the same as not implementing the Hashable interface. In other words, if an object implements the Hashable interface it is explicitly defining the standard way for comparing two instances of that type. I think that we should enforce adherence to the protocol. Furthermore, if one wants this behavior and there is no big perfomance gain between the native array and the map, there is no reason for using the map at all.

marcospassos avatar Nov 10 '17 00:11 marcospassos

Java's argument for not considering ordering a part of the criteria for equality is based on the fact that their collections aren't ordered in the first place (bar a few). What's interesting to note is that Java doesn't consider ordering even when using collections like LinkedMap. See http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/LinkedHashMap.java

rtheunissen avatar Nov 10 '17 00:11 rtheunissen