IntervalSets.jl icon indicating copy to clipboard operation
IntervalSets.jl copied to clipboard

This package doesn't use the total order predicates

Open andyferris opened this issue 7 years ago • 14 comments

In AcceleratedArrays.jl I've been playing with "search intervals" for querying data, for example finding all the dates within a given range with the help of a sort-based acceleration index.

I've shied away from using this package because the predicates for total order used by sorting algorithms in Julia are isequal and isless and the accelerations I rely on (e.g. this) may not be valid for < and == comparisons.

I was wondering if there was a reason for chosing these operators, and whether or not we might consider it advantageous to use isless and isequal instead?

(One difficulty I have is dealing with data where some values are missing, where < and == aren't even predicates that return Bool. In this case I still expect findall(in(0..10), array::Array{Union{Int, Missing}}) to work correctly and not crash!)

andyferris avatar Sep 06 '18 05:09 andyferris

You're talking about the implementations of in, presumably?

What you're asking makes quite a bit of sense, but I find myself uneasy about examples like these:

julia> isless(3, NaN)
true

julia> x = 1..NaN
1.0..NaN

And now we vote: what should the answer for the following be?

julia> 2 in x

(Currently it gives false.)

timholy avatar Sep 15 '19 10:09 timholy

Similarly, what is 2 in 1..missing?

andyferris avatar Sep 17 '19 08:09 andyferris

Shouldn’t that just be missing ?

dlfivefifty avatar Sep 17 '19 08:09 dlfivefifty

I kind of think of intervals as non-iterable sets, where in would be a Boolean-valued function and would rely on isequal semantics for implementation like in Set.

andyferris avatar Sep 17 '19 08:09 andyferris

(If in wasn’t Boolean... then we can’t use if a in b; ...; end and the pun of for a in b; ...; end is a little confusing).

andyferris avatar Sep 17 '19 08:09 andyferris

I can’t imagine a case where you want that to work when there’s missings flying around....

dlfivefifty avatar Sep 17 '19 08:09 dlfivefifty

I can’t imagine a case where you want that to work

Sorry, which to work, exactly?

I think my point of view could be expressed like this: I don't see why the function below should ever throw an exception for iterable iter (that would be terrible for readability and user-friendliness).

function f(iter)
    for x in iter
        if !(x in iter)
            error("WTF!?")
        end
    end
end

f(Set(1,2,3,NaN,missing))
f([1,2,3,NaN,missing])
...

Making the above be sane constrains both the behavior and the return type of in. We really need more Bool predicate functions than ===, <: and isa in the langauge (or else these become the only things that are safe to put into an if predicate in generic code).

andyferris avatar Sep 17 '19 11:09 andyferris

I find myself uneasy about examples like these

@timholy I also find the example you gave a little unsettling, but it bothers me less than my example function above.

andyferris avatar Sep 17 '19 11:09 andyferris

I can’t imagine a case where you want that to work when there’s missings flying around....

To give a concrete example, I would really like to do filter(in(1..10), data) where e.g. data is an arary like [0, 2, 5, 11, NaN, missing] and get [2, 5] as the answer. In this case 2 and 5 are the only things that are between 1 and 10 (inclusively) so that's the values filter preserves. (IMO it's quite useless for filter to throw an exception here).

andyferris avatar Sep 17 '19 11:09 andyferris

Similarly, what is 2 in 1..missing?

I should have said earlier, the cases that interest me more are

NaN in 1.0..10.0
missing in 1.0..10.0

andyferris avatar Sep 17 '19 11:09 andyferris

I'm convinced, I agree with your argument (though your original examples 1.0..missing and 1.0..NaN should just error out).

dlfivefifty avatar Sep 17 '19 12:09 dlfivefifty

I think the problem is that in has no consistent definition in Base, as it is consistent with ==/> for AbstractArray, but with isequal/isless for Set. It's hard to choose which definition is best for intervals. It would be nice to introduce new functions/methods so that the caller can specify which semantics are desired (waiting for deeper changes in Julia 2.0). I've filed https://github.com/JuliaLang/julia/issues/37157 about this.

For intervals, one interesting case is -0.0. Should -0.0 in 0..1 return true or false?

nalimilan avatar Aug 24 '20 13:08 nalimilan

For intervals, one interesting case is -0.0. Should -0.0 in 0..1 return true or false?

It should be true because -0.0 == 0 is true.

hyrodium avatar Apr 02 '22 13:04 hyrodium

Also:

julia> -0.0 in 0.0
true

dlfivefifty avatar Apr 02 '22 13:04 dlfivefifty