language icon indicating copy to clipboard operation
language copied to clipboard

Range syntax

Open Reprevise opened this issue 9 months ago • 16 comments

Related: #1066 #1660 dart-lang/sdk#42652

EDITS:

  • Changed syntax from .. to ... to avoid collision with cascade syntax

I'm proposing a mostly complete range syntax as seen in other languages such as a Zig and Rust. The range would be inclusive on both ends.

It would be something like this:

const List<String> items = ['foo', 'bar', 'bizz'];
final [foo, bar] = items[0...1]; // returns a list with the first two items "foo" and "bar"

It can also be used to create lists:

final List<int> numbers = 1...10; // makes an inclusive list with the numbers 1-10

It can be used to make for loops easier:

for (final i in 1...10) {
  print(i); // prints 1 though 10
}

for (final i in 10...1) { // we can even go backwards!
  print(i); // prints 10 through 1
}

Could be used to iterate through string characters (needs work):

for (final c in 'a'...'z') {
  print(c); // prints "a" through "z"
}

This syntax with strings would probably need some work, as commented in a related issue above, character strings could be introduced like c'a' and might be required here because introducing a character type would be a major breaking change if you would define a character literal with single quotes.

Switch statements/expressions:

const int value = 10;

switch (value) {
  case 1...5:
    print("value is between 1 and 5");
  case 6...10:
    print("value is between 6 and 10"); // this prints
  default:
    print("value is $value");
}

Some other questions that need to be answered

Should ranges always be known at compile time? Should we be able to do something like this:

// Imagine [start] and [end] are fetched some some data source
final int start = 0;
final int end = 5;
final List<int> inBetween = start...end; // should this be valid?

Reprevise avatar Sep 25 '23 15:09 Reprevise

How would this work together with cascade syntax?

jakemac53 avatar Sep 25 '23 16:09 jakemac53

How would this work together with cascade syntax?

Oversight on my part, will probably edit the issue to use ... instead as to not collide with the cascade syntax. AFAIK there should be no collisions with the spread syntax.

Reprevise avatar Sep 25 '23 16:09 Reprevise

Ruby's ranges can be both inclusive (..) and exclusive (...), but I think we could survive without the exclusive variation.

mateusfccp avatar Sep 25 '23 16:09 mateusfccp

If it only works for integers, I don't think it's worth doing. It's not trivial to make it work for anything else.

The range itself can fairly easily be done as:

import 'dart:collection';
List<int> r(int from, int to, [int step = 1]) => _RangeList(from, to, step);
class _RangeList extends ListBase<int> {
  final int length;
  final int _start;
  final int _step;
  _RangeList._(this._start, this.length, this._step);
  factory _RangeList(int from, int to, int step) {
    if (step == 0) throw ArgumentError.value(step, "step", "Must not be zero");
    if (step < 0) (from, to, step) = (to, from, -step);
    var count = ((to - from) ~/ step);
    if (count == 0) return _RangeList._(from, 1, 0);
    if (count > 0) return _RangeList._(from, count + 1, step);
    return _RangeList._(from, -count + 1, -step);
  }  
  bool contains(Object? other) {
    if (other is! int) {
      if (other is! num) return false;
      var otherInt = other.toInt();
      if (other != otherInt) return false;
      other = otherInt;
    }
    var stepIndex = (other - _start);
    if (_step == 1) {
      return 0 <= stepIndex && stepIndex < length;
    }
    var index = stepIndex ~/ _step;
    var remainder = stepIndex.remainder(_step);
    return remainder == 0 && index >= 0 && index < length;
  }
  int operator[](int index) {
    RangeError.checkValidIndex(index, this, "index", length);
    return _start + index * _step;
  }
  set length(int _) {
    throw UnsupportedError("Unmodifiable list");
  }
  void operator[]=(int _, int __) {
    throw UnsupportedError("Unmodifiable list");
  }
  void add(int _) {
    throw UnsupportedError("Unmodifiable list");
  }
}

then it's just:

for (var i in r(1, 10)) { ... }

and you can also do r(1, 10, 2) to skip by 2, r(1, 10, -1) which iterates from 10 to 1, or r(10, 1) which also iterates from 10 to 1. So much power, so few characters 😉. (The step should probably be named, but I can't be bothered to write names.)

(I'll even throw in

extension IntRanges on int {
  List<int> to(int to, {int step = 1}) => r(this, to, step);
  List<int> until(int to, {int step = 1}) => r(this, to - (to - this).sign, step);
}

for the 1.to(5) and 1.until(6) syntaxes, which are not shorter than r(1, 5). Heck, if brevity is the goal:

extension IntRangesFancy on int {
  List<int> operator[](int to) => r(this, to);
  List<int> call(int to) => r(this, to - (to - this).sign);
}

is an option, the close range is 1[5], the open range is 1(6). Doesn't get shorter than that!)

The "slice" operation, words[1...2], is just words.sublist(1, 3). Or words.getRange(1, 3) if you want it lazy. (And I would want it lazy. I was worried when I saw 1..10 being a list, not an iterable, but when it's as easily computable as here, we can pre-compute the length, and compute the values on every access.)

The pattern case 1...5: is the same as case >= 1 && <= 5:. Which is not as pretty as the range, but just as powerful. And we avoid having to explain whether case 5...1: matches 4 or not. (I'd love if Dart allowed chained comparisons, if (0 <= index <= length)..., and if it did, I'd argue to allow a place-holder in relation patterns, so it could be case 1 <= _ <= 5:. Then that would be sufficient for me.)

All in all, it's possibly a slight improvement, but not something I'd consider as warranting so much new syntax. Especially if it only works on integers.

If it applied to other things than integers (the string examples are not convincing), then it would make more sense to have a generalized syntax.

Then it would probably apply to any T with a T operator+(int) and a bool operator<=(T), so we can iterate as

  var current = start; 
  while (current <= end) { 
    yield current; 
    current += step;   // or += 1.
  }

We don't have an interface for allowing + int and <= self, but we could introduce one.

(But if we had that interface, the r function could work with it too, so still not a big improvement over what's already possible.)

lrhn avatar Sep 25 '23 21:09 lrhn

If it only works for integers, I don't think it's worth doing.

Agreed, it's why I proposed it works with strings too (user-defined types could have an interface to allow this syntax to work with their types too). It's not trivial to make it work for anything else. Not sure about using the <= and + operator overrides, it'd be pretty odd to allow "abc" + 1 (what would that print?).

and you can also do r(1, 10, 2) to skip by 2, r(1, 10, -1) which iterates from 10 to 1, or r(10, 1) which also iterates from 10 to 1.

I thought of adding steps to the syntax. My idea was something like 1...10:3. Though that leaves a question: What happens if the step value doesn't cleanly go to the end? Let's say we do something like this:

const value = 10;

switch (value) {
  case 1..10:4: // maybe not a colon syntax, looks a little odd in a switch statement
    print("This will never print because the list will be [1, 5, 9]");
}

Perhaps a lint could solve that, but still.

The pattern case 1...5: is the same as case >= 1 && <= 5:. Which is not as pretty as the range, but just as powerful.

I'd be okay if the range in a switch statement (w/o steps) was just syntactic sugar for what we have for pattern matching.

And we avoid having to explain whether case 5...1: matches 4 or not.

Pretty self-explanatory, no? It's a descending range.

Reprevise avatar Sep 25 '23 21:09 Reprevise

Why not make it an operator that can be defined? For integers it would default as generating a lazy iterable, and for others it can be defined as:

class CustomClass {
  <Type> operator...(<Type> rhs) { ... }
}

extension on String {
  Iterable<String> operator...(String end) sync* {
    assert(this.length == 1 && end.length == 1, "Only works for single character values!");
    int i = 0;
    int point = this.codeUnitAt(0);
    int endPoint = end.codeUnitAt(0);
    while (i < endPoint) {
      yield String.fromCodeUnit(point + i);
      ++i;
    }
}

void main() {
  Iterable<String> alphabet = "A"..."Z";
  print(alphabet); /// ("A", "B", "C", "D", ... "Z");
}

water-mizuu avatar Sep 26 '23 04:09 water-mizuu

A syntax for range to/until or extension methods would help a lot to reduce boilerplate and time spent writing algorithms on the web. Packages can't be imported and the copy-paste in every file strategy is not ideal.

Wdestroier avatar Sep 26 '23 14:09 Wdestroier

I remember reading a request to omit the var keyword in for-in loops. Replacing for (var c in callbacks) {} with for (c in callbacks) {} probably isn't a huge deal. However, when combined with this feature, writing a loop is much shorter. Compare for (var i = 0; i < 10; i++) and for (i in 0:10).

Wdestroier avatar May 08 '24 08:05 Wdestroier

Using 0:10 for the range is a problem if you want a set of ranges, because {0:10} is a map, not a set with a range. ("All the good syntax is taken!")

lrhn avatar May 08 '24 08:05 lrhn

"All the good syntax is taken!"

If a syntax that works for for-in loops and collection literals is desired, then replacing var i = 0; i < 10; i++ with i in 0...10 can save a few keystrokes and remain similar to the cascade notation. Assuming that the cascade notation would be an excellent syntax if it didn't already serve a very useful purpose.

Using 0:10 for the range is a problem if you want a set of ranges, because {0:10} is a map, not a set with a range.

Thank you very much for letting me know, @lrhn!

Wdestroier avatar May 08 '24 09:05 Wdestroier

This syntax hasn't been taken yet: 1..=10 // inclusive 1..<10 // exclusive

tatumizer avatar May 08 '24 11:05 tatumizer

Using 0:10 for the range is a problem if you want a set of ranges, because {0:10} is a map, not a set with a range. ("All the good syntax is taken!")

Maybe we could use parenthesis for disambiguation.

{1:10} // Map<int, int>
(1:10) // IntRange(1, 10)
{(1:10)} // Set<IntRange>

This could conflict a little with record syntax with named parameter, but AFAIK, currently, using a literal as the name of a named parameter is something completely impossible, so in practice it could be possible.

(a:'z') // Currently possible, Record(a: String)
(a:z) // Currently possible, Record(a: typeOf(z))
('a':'z') // Currently a static error, CharRange

Obviously, in this case we would only be able to use constant values, so this would not be valid:

final start = getStartIndex();
final list = getList();

for (int i in (start:list.length - 1)) ...

mateusfccp avatar May 08 '24 12:05 mateusfccp

As soon as an idiom is established as a part of common vocabulary, it's best to just accept it. The syntax for ranges is essentially the same in most languages (with minor variations), with the base variant being start..end. Since there can be confusion about inclusive/exclusive, some languages disambiguate it by appending = or <, which leads to start..=end and start..<end. (In dart, start..end conflicts with cascade, so adding another symbol is necessary anyway)

tatumizer avatar May 08 '24 13:05 tatumizer

I'll admit that the only range syntax I'm even a little familiar with is Python, so [start:end], with both ends being optional.

Dart could introduce something like that. Special user defined operator[:] taking two arguments, both must be nullable, possibly operator[:]= too, a list element of the form e1:e2 which requires ... something of the operands, like implementing Rangeable?

I've seen .. and ... too, but never really used them. Not since Perl and Awk.

In the end, it's two features:

  • a way to introduce an Iterable, so for/in can iterate over it as efficiently as a C-style for loop.
  • a slice operation on existing lists/arrays/iterables, which should not iterate a range, but just screws the start and end directly.

It's a range with known start and end, and a way to iterate from start to end.

It's not completely clear why the two should be combined, other than "range of integers" matching both. If not all ends are integers, then the order of iteration may be more relevant than just knowing the ends.

lrhn avatar May 08 '24 15:05 lrhn

Lately, every language I look at supports some form of range literal, usually following Python's syntax: See: zig (look up "range") swift rust C# Kotlin Elixir Julia is an exception. Range can be created by a function call or by using a different kind of range literal (with a colon as a separator)

a = 0.0:0.1:1.0  # 0.0, 0.1, 0.2, ... 1.0

golang (latest version, search for "range") uses a different notation: for i := range 10 {...}, (in golang, "range" is a keyword used for a couple of other purposes).

If you add support for start..<end or start:end (exclusive) for integers only - this will cover most known cases. Most importantly, it may allow us to write for loop more concisely, so we will very rarely need anything other than for in loop. This is the main motivation for ranges IMO. You can always generalize it later, if necessary.

(According to gemini, the start..end notation was first introduced by algol68. It was also supported by Fortran 90)

tatumizer avatar May 08 '24 16:05 tatumizer

In the end, it's two features

Agreed, maybe this should be two different issues. I didn't remember that this issue also proposed a "slice" feature.

mateusfccp avatar May 08 '24 18:05 mateusfccp