IntervalSets.jl
IntervalSets.jl copied to clipboard
This package doesn't use the total order predicates
In AcceleratedArrays.jl I've been playing with "search intervals" for querying data, for example finding all the dates within a given range with the help of a sort-based acceleration index.
I've shied away from using this package because the predicates for total order used by sorting algorithms in Julia are isequal and isless and the accelerations I rely on (e.g. this) may not be valid for < and == comparisons.
I was wondering if there was a reason for chosing these operators, and whether or not we might consider it advantageous to use isless and isequal instead?
(One difficulty I have is dealing with data where some values are missing, where < and == aren't even predicates that return Bool. In this case I still expect findall(in(0..10), array::Array{Union{Int, Missing}}) to work correctly and not crash!)
You're talking about the implementations of in, presumably?
What you're asking makes quite a bit of sense, but I find myself uneasy about examples like these:
julia> isless(3, NaN)
true
julia> x = 1..NaN
1.0..NaN
And now we vote: what should the answer for the following be?
julia> 2 in x
(Currently it gives false.)
Similarly, what is 2 in 1..missing?
Shouldn’t that just be missing ?
I kind of think of intervals as non-iterable sets, where in would be a Boolean-valued function and would rely on isequal semantics for implementation like in Set.
(If in wasn’t Boolean... then we can’t use if a in b; ...; end and the pun of for a in b; ...; end is a little confusing).
I can’t imagine a case where you want that to work when there’s missings flying around....
I can’t imagine a case where you want that to work
Sorry, which to work, exactly?
I think my point of view could be expressed like this: I don't see why the function below should ever throw an exception for iterable iter (that would be terrible for readability and user-friendliness).
function f(iter)
for x in iter
if !(x in iter)
error("WTF!?")
end
end
end
f(Set(1,2,3,NaN,missing))
f([1,2,3,NaN,missing])
...
Making the above be sane constrains both the behavior and the return type of in. We really need more Bool predicate functions than ===, <: and isa in the langauge (or else these become the only things that are safe to put into an if predicate in generic code).
I find myself uneasy about examples like these
@timholy I also find the example you gave a little unsettling, but it bothers me less than my example function above.
I can’t imagine a case where you want that to work when there’s missings flying around....
To give a concrete example, I would really like to do filter(in(1..10), data) where e.g. data is an arary like [0, 2, 5, 11, NaN, missing] and get [2, 5] as the answer. In this case 2 and 5 are the only things that are between 1 and 10 (inclusively) so that's the values filter preserves. (IMO it's quite useless for filter to throw an exception here).
Similarly, what is
2 in 1..missing?
I should have said earlier, the cases that interest me more are
NaN in 1.0..10.0
missing in 1.0..10.0
I'm convinced, I agree with your argument (though your original examples 1.0..missing and 1.0..NaN should just error out).
I think the problem is that in has no consistent definition in Base, as it is consistent with ==/> for AbstractArray, but with isequal/isless for Set. It's hard to choose which definition is best for intervals. It would be nice to introduce new functions/methods so that the caller can specify which semantics are desired (waiting for deeper changes in Julia 2.0). I've filed https://github.com/JuliaLang/julia/issues/37157 about this.
For intervals, one interesting case is -0.0. Should -0.0 in 0..1 return true or false?
For intervals, one interesting case is -0.0. Should -0.0 in 0..1 return true or false?
It should be true because -0.0 == 0 is true.
Also:
julia> -0.0 in 0.0
true