Add ==(::NullableArray, ::NullableArray) and ==(::NullableArray, ::AbstractArray)
Fixes #82.
Current coverage is 90.18%
Merging #84 into master will decrease coverage by -0.09% as of
0663641
@@ master #84 diff @@
======================================
Files 12 12
Stmts 586 611 +25
Branches 0 0
Methods 0 0
======================================
+ Hit 529 551 +22
Partial 0 0
- Missed 57 60 +3
Review entire Coverage Diff as of
0663641Powered by Codecov. Updated on successful CI builds.
Let's merge this? I need it for https://github.com/JuliaStats/DataFrames.jl/pull/1008.
We should first define hash too, right?
This is probably also effected by the discussion in #85? I.e. the semantics for == should be the same for scalar and arrays, right?
I.e. the semantics for == should be the same for scalar and arrays, right?
I don't think so, since NullableArray is not itself a Nullable type. So ==(X::NullableArray, Y::NullableArray) should return a Bool, whereas I think legitimate cases have been made for returning either a Bool or a Nullable{Bool} from ==(x::Nullable, y::Nullable).
I don't think so, since
NullableArrayis not itself aNullabletype. So==(X::NullableArray, Y::NullableArray)should return a Bool, whereas I think legitimate cases have been made for returning either aBoolor aNullable{Bool}from==(x::Nullable, y::Nullable).
@davidagold I agree with you (I'm always in favor of returning a Bool from any comparison operator), but that is not how this PR implements it. ==(X::NullableArray, Y::NullableArray) returns a Nullable{Bool} if this PR gets merged.
@davidanthoff Hmm, so it does. Good point. I don't think that's appropriate.
This may be unpopular, but I think the behavior described in this PR is more appropriately expressed by isequal, module returning a Nullable{Bool}. There is something of an analogy between NaN and Nullable, and for the former, isequal provides a more "relaxed" behavior than ==. So I think it is appropriate that this sort of relaxed behavior should belong to isequal in the case of comparing Nullables as well. Indeed, Base already implements this sort of relaxed comparison behavior for isequal(x::Nullable, y::Nullable), hence isequal(X::NullableArray, Y::NullableArray) inherits precisely the desired behavior from the AbstractArray interface. However, this is at the expense of having
julia> isequal(Nullable(), Nullable())
true
@nalimilan How do you feel about using isequal in place of == in JuliaStats/DataFrames.jl#1008?
The current behavior of isequal is indeed correct. What I'm proposing here is a definition of == which is consistent with that used for Nullable and for AbstractArray. Indeed, ==(::AbstractArray, ::AbstractArray) calls == on each pair of elements of the arrays; that definition must be extended to return a Nullable{Bool} to handle missing values. Else, we would be treating nulls as false, which we don't currently do anywhere else.
If we were to return a Bool from ==(::Nullable, ::Nullable), then we would also have to change the present PR. But if we keep the current definition, I think the most consistent choice is to return a Nullable{Bool} for both.
@nalimilan How do you feel about using isequal in place of == in JuliaStats/DataFrames.jl#1008?
I'm actually using isequal now, which is mostly what is needed when writing tests. But == for DataFrame recursively calls == on its elements, which currently triggers an error for NullableArrays. (That said, that function probably isn't very useful outside testing, where isequal is more appropriate. It's mostly needed for completeness.)
I see. And the behavior implemented herein is different from that of isequal, anyway. But won't returning Nullable{Bool} break the DataFrame comparison, then? Or have you already taken that into account? And will comparing two DataFrames return a Nullable{Bool}?
Yes, ==(::DataFrame, ::DataFrame) would also have to return a Nullable{Bool}, which requires some tweaking.
Yes, ==(::DataFrame, ::DataFrame) would also have to return a Nullable{Bool}, which requires some tweaking.
I see why having ==(::Nullable,::Nullable) return a Nullable{Bool} can sometimes be useful (although I think it complicates the design way too much to be worth the hassle), but I can't come up with any scenario where comparing two DataFrames or two NullableArrays via == returning Nullable{Bool} would actually be useful. Maybe just lack of imagination, but is there any scenario where you would compare these objects and not be in a situation where you actually want a boolean value?
Honestly, I don't even know when comparing two data frames could really be useful apart from testing or checking results. My proposal just tries to preserve consistency with ==(::Nullable, ::Nullable), which is IMO the fundamental decision.
Yeah, but checking results might be quite a common and useful thing? I don't know, you load data and manipulate data in some way (via Query.jl or jplyr.jl), and then you change your query to be more concise and want to make sure you still produce the same result set. That is something that might be quite common, but it would require a Bool as a result to be useful.
As I said, there's isequal for that already, which might be enough.
then you change your query to be more concise and want to make sure you still produce the same result set. That is something that might be quite common, but it would require a Bool as a result to be useful.
That's probably not the best way to convince oneself of the equality of those queries, anyway =p