Support union of multiple intervals
- #103
This also provides a way to simplify DomainSets.UnionDomain of intervals.
Codecov Report
:x: Patch coverage is 98.18182% with 1 line in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 98.62%. Comparing base (13c76e1) to head (a099521).
:warning: Report is 25 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/interval.jl | 90.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #156 +/- ##
==========================================
- Coverage 99.11% 98.62% -0.49%
==========================================
Files 6 7 +1
Lines 225 291 +66
==========================================
+ Hits 223 287 +64
- Misses 2 4 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Is the restriction to TypedEndpointsInterval needed anymore?
The function calls _union. Is that supposed to be a generic method?
The implementation is now to use an iterator-based method to merge sorted intervals. The bottleneck is sorting.
SortingNetworks.jl is fast but doesn't support sorting different types, so it's abandoned.
TupleTools.jl is faster when there are <=6 elements. Then the only available method is Base.sort on vectors. I choose static vector after some benchmarking.
For the union of 2 intervals, the old method is still 25% faster, so it's reserved.
Important changes:
- ref Removed StaticArrays.jl. Instead, use the basic vector to sort. Due to the loss of performance, TupleTools.jl becomes significantly faster. However, due to the exponential compilation time of tuples, the bound is set to 20, that is, for size <= 20, use TupleTools.
- ref Removed DomainSets.jl from the testing environment. As a result, we lost
setdiffand thus the ability to test the correctness of unions.
There is one uncovered line.
union(d::TypedEndpointsInterval) = d # 1 interval
It's to avoid calling
union(I::TypedEndpointsInterval...) = iterunion(sort!(collect(I); lt = leftof)) # ≥21 intervals
in that case, so I don't think it's worth testing.
Splatting is reasonably used only for small numbers of arguments in Julia. Then, the most straightforward dependency-free implementation is even more performant than this PR:
julia> ints = Tuple(i..(i+1) for i in 1:10)
(1 .. 2, 2 .. 3, 3 .. 4, 4 .. 5, 5 .. 6, 6 .. 7, 7 .. 8, 8 .. 9, 9 .. 10, 10 .. 11)
# this PR:
julia> @btime union($ints...)
255.479 ns (23 allocations: 1.52 KiB)
1 .. 11
# naive implementation:
julia> myunion(ints...) = reduce(union, ints)
julia> @btime myunion($ints...)
18.556 ns (0 allocations: 0 bytes)
1 .. 11
UPD: ah, I see, it does require sorting... I heard Julia is adding tuple sorting soon though?
Yeah, don't know the current state of the Julia PR https://github.com/JuliaLang/julia/pull/46104, but copying tuple sorting from there
function tsort(x::NTuple{N}; lt::Function=isless, by::Function=identity,
rev::Union{Bool,Nothing}=nothing, order::Base.Ordering=Base.Forward) where N
o = Base.ord(lt,by,rev,order)
issorted(x, o) ? x : _sort(x, o)
end
_sort(x::Union{NTuple{0}, NTuple{1}}, o::Base.Ordering) = x
function _sort(x::NTuple, o::Base.Ordering)
a, b = Base.IteratorsMD.split(x, Val(length(x)>>1))
merge(_sort(a, o), _sort(b, o), o)
end
merge(x::NTuple, y::NTuple{0}, o::Base.Ordering) = x
merge(x::NTuple{0}, y::NTuple, o::Base.Ordering) = y
merge(x::NTuple{0}, y::NTuple{0}, o::Base.Ordering) = x # Method ambiguity
merge(x::NTuple, y::NTuple, o::Base.Ordering) =
(Base.lt(o, y[1], x[1]) ? (y[1], merge(x, Base.tail(y), o)...) : (x[1], merge(Base.tail(x), y, o)...))
julia> myunion(ints...) = reduce(union, tsort(ints; by=i -> (leftendpoint(i), isleftopen(i))))
julia> @btime myunion($ints...)
59.723 ns (0 allocations: 0 bytes)
Yeah, don't know the current state of the Julia PR JuliaLang/julia#46104, but copying tuple sorting from there
I get:
julia> myunion(ints...) = reduce(union, tsort(ints; by=i -> (leftendpoint(i), isleftopen(i)))) julia> @btime myunion($ints...) 59.723 ns (0 allocations: 0 bytes)
If I understand correctly, it only supports type-stable tuple sorting, that is, all items must be of the exact same type.
Edit: https://github.com/JuliaLang/julia/pull/52010
By the way, that implementation is exactly the same as TupleTools.jl.