crystal `union`, `intersection` and `overlaps?` for `Range`s

Apologies for the earlier PR from my forked master branch.

Based on the request from @jgaskins. Behaviors below:

a = 1..5
b = 3..7
a.union(b)        # => 1..7
a.intersection(b) # => 3..5

c = 1..10
d = 1...15
c.union(d)        # => 1...15
c.intersection(d) # => 1..10

e = 1...10
f = 1..15
e.union(f)        # => 1..15
e.intersection(f) # => 1...10

g = 1..5
h = 10..20
g.union(h)        # => 0..0
g.intersection(h) # => 0..0

(1...10).union(1..10)        # => 1..10
(1...10).intersection(1..10) # => 1...10

(1...10).union(10..20) # => 1..20

(1..5).union(5..10) # => 1..10

(1...10).intersection(7..12) # => 7...10
(2..10).intersection(0..8)   # => 2..8

(1...10).overlaps?(10..11)   # => false
(1..10).overlaps?(5..9) # => true

Fixes #14487 Fixes #14488 Fixes #10148

P.S., very excited to be making a small contribution to a lovely language.

Looking forward to review!

Oct 21 '24 01:10 CTC97

This looks like a great start. A couple areas of focus:

Ranges can be nil on either end
It would be awesome to see some tests — this functionality can be super confusing and, honestly, I wouldn't be able to tell whether any implementation was correct when all edge cases are considered

Oct 21 '24 03:10 jgaskins

Some specs would be a good idea as well :+1:.

Oct 21 '24 03:10 Blacksmoke16

Tests and specs? In this economy?

Oct 21 '24 04:10 jgaskins

Added some specs (edit: will go back and add tests for other types of Ranges) and went with the nil return as originally suggested in @jgaskins' issue.

On error handling - should I be adding any type handling in the functions themselves or is an error message like below sufficient?

(1..5).union(time_range)

---

In src/range.cr:263:25

 263 | if self.end < other.begin || other.end < self.begin
                           ^----
Error: expected argument #1 to 'Int32#<' to be Float32, Float64, Int128, Int16, Int32, Int64, Int8, Number, UInt128, UInt16, UInt32, UInt64 or UInt8, not Time

or

(time_range).union(1..5)

---

In src/range.cr:263:25

 263 | if self.end < other.begin || other.end < self.begin
                           ^----
Error: expected argument #1 to 'Time#<' to be Time, not Int32

Oct 22 '24 02:10 CTC97

I don't think we can do much about improving these error message anyways. That would require knowing for which types the comparison operator is defined, and we cannot know this. We mustn't restrict it to identical types because there are many examples where different typs are still comparable.

Oct 22 '24 11:10 straight-shoota

Added some additional tests for other datatypes (let me know if there are any more that should be spec'd), but running into an interesting issue - detailed over one of the tests.

  # TODO: This test will fail - 
  #   In an integer setting this case is doable. We can simply check adjaceny with +-1.
  #   However, how do we define an adjacent range for other data types? 
  #     Examples of adjacency ambiguity:
  #       - Floats: Is 4.1's upper adjacent value 4.2 or 4.11, etc.?
  #       - Times: Is a time's adjacent value +1 second or +1 hour, etc.?
  # 
  #   How do we define adjacency?
  #   
  # (1..5).union(6..10).should eq(1..10)      # Adjacent ranges

Some of these intended adjacencies would depend on the user's use case, but I'm not sure there's a reliable way to allow the user that kind of customizability. Is there a standard we should follow?

Oct 23 '24 01:10 CTC97

Adjacency can only work for types with (meaningfully) discrete values. This should be indicated by #succ and #pred methods defined on the item type. Range#each (and #reverse_each) are based on that. Technically, most (all?) comparable types have discrete steps in some form, but they're not always meaningful. Float#next_float and time + 1.nanosecond would return adjacent values, but those are defined in the representation and not very useful in practice.

So for types that define #succ or #pred, we could discover adjacency. For all others, to form a union the ranges must at least touch. This can match exclusive ranges though: (1.0...5.0).union(5.0..10.0).should eq(1.0..10.0). This would be the only case where two ranges with continuous values could form a valid union without overlap.

Oct 23 '24 08:10 straight-shoota

What about some edge cases? For example empty ranges:

(1...1).intersect(0..10) #=> empty

Oct 23 '24 14:10 ysbaddaden

What about some edge cases? For example empty ranges:
(1...1).intersect(0..10) #=> empty

I can add a check for one of the ranges being empty (i.e. non-inclusive ranges where the ends are equal), but is it valid to return 1...1, which is an accurate representation of the intersection. (...is it accurate?)

Oct 25 '24 02:10 CTC97

Apologies for the untimely updates here - been a bit of a hectic few weeks, but will be jumping back into this asap! I have not forgotten! :sweat_smile:

Nov 05 '24 15:11 CTC97

I can add a check for one of the ranges being empty (i.e. non-inclusive ranges where the ends are equal), but is it valid to return 1...1, which is an accurate representation of the intersection. (...is it accurate?)

Well, 1...1 is empty, so I don't expect it to intersect with anything but another empty range. IMO it should return an empty range. Hence the suggestions to return an empty (whatever the chosen values) instead of nil. The question is whether returning whatever values can have some edge effects?

Nov 14 '24 11:11 ysbaddaden

Well, 1...1 is empty, so I don't expect it to intersect with anything but another empty range. IMO it should return an empty range. Hence the suggestions to return an empty (whatever the chosen values) instead of nil. The question is whether returning whatever values can have some edge effects?

I could imagine some edge effects, but only if the end user was comparing data to either the beginning or end of the empty range without considering context of the full object (i.e., both ends).

I don't think it's unreasonable to expect users to handle that sort of return if their use case could result that way, but I'm open to other thoughts.

I think I'm also a little unclear as to why nil isn't a valid return in such a case, but maybe I'm missing something.

Nov 16 '24 01:11 CTC97

It's not that nil is an invalid value, but it's cumbersome. Anytime you intersect you MUST deal with a nilable, when an empty range would be a transparent NOOP.

Nov 18 '24 10:11 ysbaddaden

Coming back to this - what do people think is a reasonable empty range to return in the edge cases discussed above? I'm not sure if we've reached a solid conclusion.

Dec 04 '24 03:12 CTC97

I implemented empty ranges with the logic @straight-shoota describes in #10148, which describes a #clamp function (which is interchangeable with #intersect):

If both ranges are disparate ((-5..5).clamp(10..100)), the result should probably be an empty range. This can be expressed as an exclusive range with identical begin and end (x...x). But what should x be? Probably best to use the begin of the clamped range (-5...-5).

I think this is a good resolution as it logically makes sense and won't require the user to handle nil.

I've also added #10148 to this PR, but I think there's a case to be made for giving #intersection an alias #clamp, which users might find intuitive.

Dec 06 '24 02:12 CTC97

Bringing this PR back up...are there any requested changes or should we get this one merged?

Jan 06 '25 16:01 CTC97

Specs are failing. And we're missing specs for endless and beginless ranges (probably also the implementation doesn't handle them correctly yet).

A couple of test cases for the clamp behaviour: https://github.com/straight-shoota/crystal/blob/27c3ee98d22c48d426192c75e7c2656ef4ddb53c/spec/std/range_spec.cr#L809-L829

Mar 28 '25 09:03 straight-shoota

@CTC97 Are you up for completing this PR? Or would you prefer someone else to take it over the finish line?

May 01 '25 20:05 straight-shoota

@straight-shoota Happy to let someone finish this one - sorry for falling behind on this one guys!

May 02 '25 14:05 CTC97

Comparison with the specs from https://github.com/crystal-lang/crystal/pull/10300 reveals this PR misses or only partially covers some scenarios. For example, open-ended ranges or ranges abutting at an exclusive end. There might be a couple of edge cases on top of that.

That's just to say it'll require a bit more work to finalize this.

May 09 '25 17:05 straight-shoota

crystal crystal copied to clipboard

`union`, `intersection` and `overlaps?` for `Range`s

crystal
crystal copied to clipboard