IntervalSets.jl icon indicating copy to clipboard operation
IntervalSets.jl copied to clipboard

Support union of multiple intervals

Open putianyi889 opened this issue 2 years ago • 8 comments

  • #103

This also provides a way to simplify DomainSets.UnionDomain of intervals.

putianyi889 avatar Sep 09 '23 09:09 putianyi889

Codecov Report

:x: Patch coverage is 98.18182% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 98.62%. Comparing base (13c76e1) to head (a099521). :warning: Report is 25 commits behind head on master.

Files with missing lines Patch % Lines
src/interval.jl 90.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #156      +/-   ##
==========================================
- Coverage   99.11%   98.62%   -0.49%     
==========================================
  Files           6        7       +1     
  Lines         225      291      +66     
==========================================
+ Hits          223      287      +64     
- Misses          2        4       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Sep 09 '23 09:09 codecov[bot]

Is the restriction to TypedEndpointsInterval needed anymore?

dlfivefifty avatar Sep 09 '23 10:09 dlfivefifty

The function calls _union. Is that supposed to be a generic method?

putianyi889 avatar Sep 09 '23 11:09 putianyi889

The implementation is now to use an iterator-based method to merge sorted intervals. The bottleneck is sorting.

SortingNetworks.jl is fast but doesn't support sorting different types, so it's abandoned.

TupleTools.jl is faster when there are <=6 elements. Then the only available method is Base.sort on vectors. I choose static vector after some benchmarking.

For the union of 2 intervals, the old method is still 25% faster, so it's reserved.

putianyi889 avatar Feb 18 '24 09:02 putianyi889

Important changes:

  • ref Removed StaticArrays.jl. Instead, use the basic vector to sort. Due to the loss of performance, TupleTools.jl becomes significantly faster. However, due to the exponential compilation time of tuples, the bound is set to 20, that is, for size <= 20, use TupleTools.
  • ref Removed DomainSets.jl from the testing environment. As a result, we lost setdiff and thus the ability to test the correctness of unions.

There is one uncovered line.

union(d::TypedEndpointsInterval) = d # 1 interval

It's to avoid calling

union(I::TypedEndpointsInterval...) = iterunion(sort!(collect(I); lt = leftof)) # ≥21 intervals

in that case, so I don't think it's worth testing.

putianyi889 avatar Feb 22 '24 14:02 putianyi889

Splatting is reasonably used only for small numbers of arguments in Julia. Then, the most straightforward dependency-free implementation is even more performant than this PR:

julia> ints = Tuple(i..(i+1) for i in 1:10)
(1 .. 2, 2 .. 3, 3 .. 4, 4 .. 5, 5 .. 6, 6 .. 7, 7 .. 8, 8 .. 9, 9 .. 10, 10 .. 11)

# this PR:
julia> @btime union($ints...)
  255.479 ns (23 allocations: 1.52 KiB)
1 .. 11

# naive implementation:
julia> myunion(ints...) = reduce(union, ints)

julia> @btime myunion($ints...)
  18.556 ns (0 allocations: 0 bytes)
1 .. 11

UPD: ah, I see, it does require sorting... I heard Julia is adding tuple sorting soon though?

aplavin avatar Feb 22 '24 14:02 aplavin

Yeah, don't know the current state of the Julia PR https://github.com/JuliaLang/julia/pull/46104, but copying tuple sorting from there


	function tsort(x::NTuple{N}; lt::Function=isless, by::Function=identity,
	               rev::Union{Bool,Nothing}=nothing, order::Base.Ordering=Base.Forward) where N
	     o = Base.ord(lt,by,rev,order)
	    issorted(x, o) ? x : _sort(x, o)
	 end
	 _sort(x::Union{NTuple{0}, NTuple{1}}, o::Base.Ordering) = x
	 function _sort(x::NTuple, o::Base.Ordering)
	     a, b = Base.IteratorsMD.split(x, Val(length(x)>>1))
	     merge(_sort(a, o), _sort(b, o), o)
	 end
	 merge(x::NTuple, y::NTuple{0}, o::Base.Ordering) = x
	 merge(x::NTuple{0}, y::NTuple, o::Base.Ordering) = y
	 merge(x::NTuple{0}, y::NTuple{0}, o::Base.Ordering) = x # Method ambiguity
	 merge(x::NTuple, y::NTuple, o::Base.Ordering) =
	     (Base.lt(o, y[1], x[1]) ? (y[1], merge(x, Base.tail(y), o)...) : (x[1], merge(Base.tail(x), y, o)...))
I get:
julia> myunion(ints...) = reduce(union, tsort(ints; by=i -> (leftendpoint(i), isleftopen(i))))
julia> @btime myunion($ints...)
  59.723 ns (0 allocations: 0 bytes)

aplavin avatar Feb 22 '24 15:02 aplavin

Yeah, don't know the current state of the Julia PR JuliaLang/julia#46104, but copying tuple sorting from there

I get:

julia> myunion(ints...) = reduce(union, tsort(ints; by=i -> (leftendpoint(i), isleftopen(i))))
julia> @btime myunion($ints...)
  59.723 ns (0 allocations: 0 bytes)

If I understand correctly, it only supports type-stable tuple sorting, that is, all items must be of the exact same type.

Edit: https://github.com/JuliaLang/julia/pull/52010

By the way, that implementation is exactly the same as TupleTools.jl.

putianyi889 avatar Feb 22 '24 15:02 putianyi889