crystal
crystal copied to clipboard
`union`, `intersection` and `overlaps?` for `Range`s
Apologies for the earlier PR from my forked master branch.
Based on the request from @jgaskins. Behaviors below:
a = 1..5
b = 3..7
a.union(b) # => 1..7
a.intersection(b) # => 3..5
c = 1..10
d = 1...15
c.union(d) # => 1...15
c.intersection(d) # => 1..10
e = 1...10
f = 1..15
e.union(f) # => 1..15
e.intersection(f) # => 1...10
g = 1..5
h = 10..20
g.union(h) # => 0..0
g.intersection(h) # => 0..0
(1...10).union(1..10) # => 1..10
(1...10).intersection(1..10) # => 1...10
(1...10).union(10..20) # => 1..20
(1..5).union(5..10) # => 1..10
(1...10).intersection(7..12) # => 7...10
(2..10).intersection(0..8) # => 2..8
(1...10).overlaps?(10..11) # => false
(1..10).overlaps?(5..9) # => true
Fixes #14487 Fixes #14488 Fixes #10148
P.S., very excited to be making a small contribution to a lovely language.
Looking forward to review!
This looks like a great start. A couple areas of focus:
- Ranges can be
nilon either end - It would be awesome to see some tests — this functionality can be super confusing and, honestly, I wouldn't be able to tell whether any implementation was correct when all edge cases are considered
Some specs would be a good idea as well :+1:.
Tests and specs? In this economy?
Added some specs (edit: will go back and add tests for other types of Ranges) and went with the nil return as originally suggested in @jgaskins' issue.
On error handling - should I be adding any type handling in the functions themselves or is an error message like below sufficient?
(1..5).union(time_range)
---
In src/range.cr:263:25
263 | if self.end < other.begin || other.end < self.begin
^----
Error: expected argument #1 to 'Int32#<' to be Float32, Float64, Int128, Int16, Int32, Int64, Int8, Number, UInt128, UInt16, UInt32, UInt64 or UInt8, not Time
or
(time_range).union(1..5)
---
In src/range.cr:263:25
263 | if self.end < other.begin || other.end < self.begin
^----
Error: expected argument #1 to 'Time#<' to be Time, not Int32
I don't think we can do much about improving these error message anyways. That would require knowing for which types the comparison operator is defined, and we cannot know this. We mustn't restrict it to identical types because there are many examples where different typs are still comparable.
Added some additional tests for other datatypes (let me know if there are any more that should be spec'd), but running into an interesting issue - detailed over one of the tests.
# TODO: This test will fail -
# In an integer setting this case is doable. We can simply check adjaceny with +-1.
# However, how do we define an adjacent range for other data types?
# Examples of adjacency ambiguity:
# - Floats: Is 4.1's upper adjacent value 4.2 or 4.11, etc.?
# - Times: Is a time's adjacent value +1 second or +1 hour, etc.?
#
# How do we define adjacency?
#
# (1..5).union(6..10).should eq(1..10) # Adjacent ranges
Some of these intended adjacencies would depend on the user's use case, but I'm not sure there's a reliable way to allow the user that kind of customizability. Is there a standard we should follow?
Adjacency can only work for types with (meaningfully) discrete values. This should be indicated by #succ and #pred methods defined on the item type. Range#each (and #reverse_each) are based on that.
Technically, most (all?) comparable types have discrete steps in some form, but they're not always meaningful. Float#next_float and time + 1.nanosecond would return adjacent values, but those are defined in the representation and not very useful in practice.
So for types that define #succ or #pred, we could discover adjacency. For all others, to form a union the ranges must at least touch. This can match exclusive ranges though: (1.0...5.0).union(5.0..10.0).should eq(1.0..10.0). This would be the only case where two ranges with continuous values could form a valid union without overlap.
What about some edge cases? For example empty ranges:
(1...1).intersect(0..10) #=> empty
What about some edge cases? For example empty ranges:
(1...1).intersect(0..10) #=> empty
I can add a check for one of the ranges being empty (i.e. non-inclusive ranges where the ends are equal), but is it valid to return 1...1, which is an accurate representation of the intersection. (...is it accurate?)
Apologies for the untimely updates here - been a bit of a hectic few weeks, but will be jumping back into this asap! I have not forgotten! :sweat_smile:
I can add a check for one of the ranges being empty (i.e. non-inclusive ranges where the ends are equal), but is it valid to return 1...1, which is an accurate representation of the intersection. (...is it accurate?)
Well, 1...1 is empty, so I don't expect it to intersect with anything but another empty range. IMO it should return an empty range. Hence the suggestions to return an empty (whatever the chosen values) instead of nil. The question is whether returning whatever values can have some edge effects?
Well, 1...1 is empty, so I don't expect it to intersect with anything but another empty range. IMO it should return an empty range. Hence the suggestions to return an empty (whatever the chosen values) instead of nil. The question is whether returning whatever values can have some edge effects?
I could imagine some edge effects, but only if the end user was comparing data to either the beginning or end of the empty range without considering context of the full object (i.e., both ends).
I don't think it's unreasonable to expect users to handle that sort of return if their use case could result that way, but I'm open to other thoughts.
I think I'm also a little unclear as to why nil isn't a valid return in such a case, but maybe I'm missing something.
It's not that nil is an invalid value, but it's cumbersome. Anytime you intersect you MUST deal with a nilable, when an empty range would be a transparent NOOP.
Coming back to this - what do people think is a reasonable empty range to return in the edge cases discussed above? I'm not sure if we've reached a solid conclusion.
I implemented empty ranges with the logic @straight-shoota describes in #10148, which describes a #clamp function (which is interchangeable with #intersect):
If both ranges are disparate ((-5..5).clamp(10..100)), the result should probably be an empty range. This can be expressed as an exclusive range with identical begin and end (x...x). But what should x be? Probably best to use the begin of the clamped range (-5...-5).
I think this is a good resolution as it logically makes sense and won't require the user to handle nil.
I've also added #10148 to this PR, but I think there's a case to be made for giving #intersection an alias #clamp, which users might find intuitive.
Bringing this PR back up...are there any requested changes or should we get this one merged?
Specs are failing. And we're missing specs for endless and beginless ranges (probably also the implementation doesn't handle them correctly yet).
A couple of test cases for the clamp behaviour: https://github.com/straight-shoota/crystal/blob/27c3ee98d22c48d426192c75e7c2656ef4ddb53c/spec/std/range_spec.cr#L809-L829
@CTC97 Are you up for completing this PR? Or would you prefer someone else to take it over the finish line?
@straight-shoota Happy to let someone finish this one - sorry for falling behind on this one guys!
Comparison with the specs from https://github.com/crystal-lang/crystal/pull/10300 reveals this PR misses or only partially covers some scenarios. For example, open-ended ranges or ranges abutting at an exclusive end. There might be a couple of edge cases on top of that.
That's just to say it'll require a bit more work to finalize this.