NullableArrays.jl icon indicating copy to clipboard operation
NullableArrays.jl copied to clipboard

Add ==(::NullableArray, ::NullableArray) and ==(::NullableArray, ::AbstractArray)

Open nalimilan opened this issue 10 years ago • 15 comments

Fixes #82.

nalimilan avatar Oct 12 '15 15:10 nalimilan

Current coverage is 90.18%

Merging #84 into master will decrease coverage by -0.09% as of 0663641

@@            master     #84   diff @@
======================================
  Files           12      12       
  Stmts          586     611    +25
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit            529     551    +22
  Partial          0       0       
- Missed          57      60     +3

Review entire Coverage Diff as of 0663641

Powered by Codecov. Updated on successful CI builds.

codecov-io avatar Oct 12 '15 15:10 codecov-io

Let's merge this? I need it for https://github.com/JuliaStats/DataFrames.jl/pull/1008.

nalimilan avatar Aug 28 '16 10:08 nalimilan

We should first define hash too, right?

davidagold avatar Aug 28 '16 14:08 davidagold

This is probably also effected by the discussion in #85? I.e. the semantics for == should be the same for scalar and arrays, right?

davidanthoff avatar Aug 28 '16 17:08 davidanthoff

I.e. the semantics for == should be the same for scalar and arrays, right?

I don't think so, since NullableArray is not itself a Nullable type. So ==(X::NullableArray, Y::NullableArray) should return a Bool, whereas I think legitimate cases have been made for returning either a Bool or a Nullable{Bool} from ==(x::Nullable, y::Nullable).

davidagold avatar Aug 28 '16 17:08 davidagold

I don't think so, since NullableArray is not itself a Nullable type. So ==(X::NullableArray, Y::NullableArray) should return a Bool, whereas I think legitimate cases have been made for returning either a Bool or a Nullable{Bool} from ==(x::Nullable, y::Nullable).

@davidagold I agree with you (I'm always in favor of returning a Bool from any comparison operator), but that is not how this PR implements it. ==(X::NullableArray, Y::NullableArray) returns a Nullable{Bool} if this PR gets merged.

davidanthoff avatar Aug 28 '16 18:08 davidanthoff

@davidanthoff Hmm, so it does. Good point. I don't think that's appropriate.

This may be unpopular, but I think the behavior described in this PR is more appropriately expressed by isequal, module returning a Nullable{Bool}. There is something of an analogy between NaN and Nullable, and for the former, isequal provides a more "relaxed" behavior than ==. So I think it is appropriate that this sort of relaxed behavior should belong to isequal in the case of comparing Nullables as well. Indeed, Base already implements this sort of relaxed comparison behavior for isequal(x::Nullable, y::Nullable), hence isequal(X::NullableArray, Y::NullableArray) inherits precisely the desired behavior from the AbstractArray interface. However, this is at the expense of having

julia> isequal(Nullable(), Nullable())
true

@nalimilan How do you feel about using isequal in place of == in JuliaStats/DataFrames.jl#1008?

davidagold avatar Aug 28 '16 18:08 davidagold

The current behavior of isequal is indeed correct. What I'm proposing here is a definition of == which is consistent with that used for Nullable and for AbstractArray. Indeed, ==(::AbstractArray, ::AbstractArray) calls == on each pair of elements of the arrays; that definition must be extended to return a Nullable{Bool} to handle missing values. Else, we would be treating nulls as false, which we don't currently do anywhere else.

If we were to return a Bool from ==(::Nullable, ::Nullable), then we would also have to change the present PR. But if we keep the current definition, I think the most consistent choice is to return a Nullable{Bool} for both.

@nalimilan How do you feel about using isequal in place of == in JuliaStats/DataFrames.jl#1008?

I'm actually using isequal now, which is mostly what is needed when writing tests. But == for DataFrame recursively calls == on its elements, which currently triggers an error for NullableArrays. (That said, that function probably isn't very useful outside testing, where isequal is more appropriate. It's mostly needed for completeness.)

nalimilan avatar Aug 28 '16 21:08 nalimilan

I see. And the behavior implemented herein is different from that of isequal, anyway. But won't returning Nullable{Bool} break the DataFrame comparison, then? Or have you already taken that into account? And will comparing two DataFrames return a Nullable{Bool}?

davidagold avatar Aug 28 '16 22:08 davidagold

Yes, ==(::DataFrame, ::DataFrame) would also have to return a Nullable{Bool}, which requires some tweaking.

nalimilan avatar Aug 29 '16 09:08 nalimilan

Yes, ==(::DataFrame, ::DataFrame) would also have to return a Nullable{Bool}, which requires some tweaking.

I see why having ==(::Nullable,::Nullable) return a Nullable{Bool} can sometimes be useful (although I think it complicates the design way too much to be worth the hassle), but I can't come up with any scenario where comparing two DataFrames or two NullableArrays via == returning Nullable{Bool} would actually be useful. Maybe just lack of imagination, but is there any scenario where you would compare these objects and not be in a situation where you actually want a boolean value?

davidanthoff avatar Aug 29 '16 20:08 davidanthoff

Honestly, I don't even know when comparing two data frames could really be useful apart from testing or checking results. My proposal just tries to preserve consistency with ==(::Nullable, ::Nullable), which is IMO the fundamental decision.

nalimilan avatar Aug 29 '16 21:08 nalimilan

Yeah, but checking results might be quite a common and useful thing? I don't know, you load data and manipulate data in some way (via Query.jl or jplyr.jl), and then you change your query to be more concise and want to make sure you still produce the same result set. That is something that might be quite common, but it would require a Bool as a result to be useful.

davidanthoff avatar Aug 29 '16 21:08 davidanthoff

As I said, there's isequal for that already, which might be enough.

nalimilan avatar Aug 29 '16 21:08 nalimilan

then you change your query to be more concise and want to make sure you still produce the same result set. That is something that might be quite common, but it would require a Bool as a result to be useful.

That's probably not the best way to convince oneself of the equality of those queries, anyway =p

davidagold avatar Aug 29 '16 22:08 davidagold