rfcs Add `from` and `to` parameters to Array* iterators

The correct way in Pony to iterate over items seems to be through the for control structure which uses the Iterator interface. However if I want to iterate over a subset, I have to use Range from collections and use that to index into the array.

Changing .keys(), .values() and .pairs() to take (from: USize, to: USize) would make this a lot easier. The change seems quite small, but where I'm unsure is how to handle from > .size() and to < from. Some might say that these should cause an error (making these functions partial) and some might argue that they should cause an empty set / iterator.

I propose that the respective Iterator classes should change their default constructor to:

  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
    _array = array
    _i = from
    _end = to.min(array.size())

Another issue is arrays that somehow become shorter than the iterator, while it is in use, since the signature of the array iterators are eg. ArrayKeys[A, B: Array[A] #read], where only #read is required.

Aug 02 '17 19:08 emilbayes

Oh yeah, I'd be happy to write the PR to ponyc and the RFC too

Aug 02 '17 19:08 emilbayes

This makes sense in general to me - we could talk about the details in the RFC.

Note that we could apply this convention not just to Array but also other random-access sequential collections.

Aug 02 '17 20:08 jemc

Returning an "empty" iterator would also be very simple. If from is larger than to the internal _i counter would start out larger than _end in my example and we could do the following:

fun has_next(): Bool =>
    _i < _end

Then there wouldn't be any breaking changes from iterator constructor becoming a partial function. But then the constructor would probably have _end = to.min(array.size()), although the clamping would be "silent".

What about an array becoming smaller than the iterator while it is being looped over?

Aug 02 '17 20:08 emilbayes

Currently the only example of a linked-list style iterator, for a class that has Seq in it's inheritance (?) tree is List, so we'd need to figure out if Seq and ReadSeq should demand fun values() or that should be another trait.

Here's a diff of what I've had to change for the proposed changes to compile:

diff --git a/packages/builtin/array.pony b/packages/builtin/array.pony
index 4026e2e8f..13f78fcd2 100644
--- a/packages/builtin/array.pony
+++ b/packages/builtin/array.pony
@@ -528,34 +528,36 @@ class Array[A] is Seq[A]
       end
     end
 
-  fun keys(): ArrayKeys[A, this->Array[A]]^ =>
+  fun keys(from: USize = 0, to: USize = USize.max_value()): ArrayKeys[A, this->Array[A]]^ =>
     """
     Return an iterator over the indices in the array.
     """
-    ArrayKeys[A, this->Array[A]](this)
+    ArrayKeys[A, this->Array[A]](this, from, to)
 
-  fun values(): ArrayValues[A, this->Array[A]]^ =>
+  fun values(from: USize = 0, to: USize = USize.max_value()): ArrayValues[A, this->Array[A]]^ =>
     """
     Return an iterator over the values in the array.
     """
-    ArrayValues[A, this->Array[A]](this)
+    ArrayValues[A, this->Array[A]](this, from, to)
 
-  fun pairs(): ArrayPairs[A, this->Array[A]]^ =>
+  fun pairs(from: USize = 0, to: USize = USize.max_value()): ArrayPairs[A, this->Array[A]]^ =>
     """
     Return an iterator over the (index, value) pairs in the array.
     """
-    ArrayPairs[A, this->Array[A]](this)
+    ArrayPairs[A, this->Array[A]](this, from, to)
 
 class ArrayKeys[A, B: Array[A] #read] is Iterator[USize]
   let _array: B
   var _i: USize
+  let _end: USize
 
-  new create(array: B) =>
+  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
     _array = array
-    _i = 0
+    _i = from
+    _end = to.min(array.size())
 
   fun has_next(): Bool =>
-    _i < _array.size()
+    _i < _end
 
   fun ref next(): USize =>
     if _i < _array.size() then
@@ -567,13 +569,15 @@ class ArrayKeys[A, B: Array[A] #read] is Iterator[USize]
 class ArrayValues[A, B: Array[A] #read] is Iterator[B->A]
   let _array: B
   var _i: USize
+  let _end: USize
 
-  new create(array: B) =>
+  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
     _array = array
-    _i = 0
+    _i = from
+    _end = to.min(array.size())
 
   fun has_next(): Bool =>
-    _i < _array.size()
+    _i < _end
 
   fun ref next(): B->A ? =>
     _array(_i = _i + 1)?
@@ -585,13 +589,15 @@ class ArrayValues[A, B: Array[A] #read] is Iterator[B->A]
 class ArrayPairs[A, B: Array[A] #read] is Iterator[(USize, B->A)]
   let _array: B
   var _i: USize
+  let _end: USize
 
-  new create(array: B) =>
+  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
     _array = array
-    _i = 0
+    _i = from
+    _end = to.min(array.size())
 
   fun has_next(): Bool =>
-    _i < _array.size()
+    _i < _end
 
   fun ref next(): (USize, B->A) ? =>
     (_i, _array(_i = _i + 1)?)
diff --git a/packages/builtin/read_seq.pony b/packages/builtin/read_seq.pony
index c273fab65..8e7b03f95 100644
--- a/packages/builtin/read_seq.pony
+++ b/packages/builtin/read_seq.pony
@@ -13,7 +13,7 @@ interface box ReadSeq[A]
     is out of bounds. Note that this returns A^, not this->A.
     """
 
-  fun values(): Iterator[this->A]^
+  fun values(from: USize = 0, to: USize = USize.max_value()): Iterator[this->A]^
     """
     Returns an iterator over the elements of the sequence. Note that this
     iterates over A^, not this->A.
diff --git a/packages/builtin/seq.pony b/packages/builtin/seq.pony
index 233e7d5a3..c2bb821bc 100644
--- a/packages/builtin/seq.pony
+++ b/packages/builtin/seq.pony
@@ -75,7 +75,7 @@ interface Seq[A]
     If the sequence is already smaller than len, do nothing.
     """
 
-  fun values(): Iterator[this->A]^
+  fun values(from: USize = 0, to: USize = USize.max_value()): Iterator[this->A]^
     """
     Returns an iterator over the elements of the sequence.
     """
diff --git a/packages/builtin/string.pony b/packages/builtin/string.pony
index 10dde5eaf..35cae2d43 100644
--- a/packages/builtin/string.pony
+++ b/packages/builtin/string.pony
@@ -1438,17 +1438,17 @@ actor Main
   fun string(): String iso^ =>
     clone()
 
-  fun values(): StringBytes^ =>
+  fun values(from: USize = 0, to: USize = USize.max_value()): StringBytes^ =>
     """
     Return an iterator over the bytes in the string.
     """
-    StringBytes(this)
+    StringBytes(this, from, to)
 
-  fun runes(): StringRunes^ =>
+  fun runes(from: USize = 0, to: USize = USize.max_value()): StringRunes^ =>
     """
     Return an iterator over the codepoints in the string.
     """
-    StringRunes(this)
+    StringRunes(this, from, to)
 
   fun ref _set(i: USize, value: U8): U8 =>
     """
@@ -1459,13 +1459,15 @@ actor Main
 class StringBytes is Iterator[U8]
   let _string: String box
   var _i: USize
+  let _end: USize
 
-  new create(string: String box) =>
+  new create(string: String box, from: USize = 0, to: USize = USize.max_value()) =>
     _string = string
-    _i = 0
+    _i = from
+    _end = to.min(string.size())
 
   fun has_next(): Bool =>
-    _i < _string.size()
+    _i < _end
 
   fun ref next(): U8 ? =>
     _string(_i = _i + 1)?
@@ -1473,13 +1475,15 @@ class StringBytes is Iterator[U8]
 class StringRunes is Iterator[U32]
   let _string: String box
   var _i: USize
+  let _end: USize
 
-  new create(string: String box) =>
+  new create(string: String box, from: USize = 0, to: USize = USize.max_value()) =>
     _string = string
-    _i = 0
+    _i = from
+    _end = to.min(string.size())
 
   fun has_next(): Bool =>
-    _i < _string.size()
+    _i < _end
 
   fun ref next(): U32 ? =>
     (let rune, let len) = _string.utf32(_i.isize())?
diff --git a/packages/collections/list.pony b/packages/collections/list.pony
index 1672bfc72..ed2903a31 100644
--- a/packages/collections/list.pony
+++ b/packages/collections/list.pony
@@ -530,7 +530,7 @@ class List[A] is Seq[A]
     """
     ListNodes[A, this->ListNode[A]](_head, true)
 
-  fun values(): ListValues[A, this->ListNode[A]]^ =>
+  fun values(a: USize = 0, b: USize = USize.max_value()): ListValues[A, this->ListNode[A]]^ =>
     """
     Return an iterator on the values in the list.
     """

Aug 02 '17 22:08 emilbayes

What about an array becoming smaller than the iterator while it is being looped over?

IIRC we already don't have any behavioural guarantees for mutating a collection that you're iterating over, so I think the answer that "behaviour in such a case is undefined" is the best we can say.

Aug 02 '17 22:08 jemc

@emilbayes Regarding the question of whether List (and other linked-lists) should accept from and to parameters in the values method, I'd say that because they already support indexed random-access (via the apply and update methods), adding from and to constraints to their iterators should be fair game.

Aug 02 '17 22:08 jemc

Also, I notice that your diff is missing the collections/persistent subpackage - you probably want to spruce those up as well.

Aug 02 '17 22:08 jemc

Thanks for the input @jemc

Re: List: Should the "price" of forwarding to from be paid in the iterator constructor? Is that intuitive? (ofc it should be documented)

Re: collections/persistent: Will do! I guess these didn't show up as they're not part of any of the traits I changed. They also look more tricky at first glance (I haven't used them before)

Aug 02 '17 23:08 emilbayes

Re: List: Should the "price" of forwarding to from be paid in the iterator constructor? Is that intuitive? (ofc it should be documented)

Seems intuitive to me, but maybe I'm not understanding you correctly.

Aug 02 '17 23:08 jemc

I just want it to be unsurprising that a large from with Listwill incur some kind of cost, unlike with Array :)

Aug 02 '17 23:08 emilbayes

Makes sense - I think documentation is enough for that. As I said above, if it's already allowing random access via apply, it's not like you're adding something new in terms of jump cost, as I see it.

Aug 02 '17 23:08 jemc

@emilbayes are you still interested in writing an RFC for this?

Feb 04 '22 15:02 SeanTAllen

@jemc are you still in favor of this, if yes, I can take on writing an RFC as it seems relatively straightforward to write.

Jan 02 '23 22:01 SeanTAllen

Yes, I'm still in favor of it.

Jan 03 '23 16:01 jemc

Some thoughts/discussion points...

If we have some functions (existing or new) that has from and to parameters that result in "can't compute" it would make sense to have the function(s) be partial. I don't like the idea of "empty set iterator" for "you gave incorrect input". That's a nice way to hide a bug.

I don't want to make the existing methods partial. Ewwww...
I don't much like the idea of adding additional methods for "a range".

Perhaps... ArrayKeys et al are modified and keys, values, etc methods are left as is. Something like this...

  new free_candy(array: B, from: Usize, to: USize) ? =>
    ...
    
  new free_candy_from(x: ArrayKeys[A, B], from: USize, to: USize) ? =>

Where you can either.. create an ArrayKeys directly that is constrained to a subset of the Array or you can create from an existing ArrayKeys.

Thoughts Joe? Technically, people can do these on their own without it being incorporated into the standard library. There's nothing to stop someone from creating these iterators on their own. The only issue is "is there a method to get directly from Array et al". I'm quite ok with Array et al have keys, values etc only return "most commonly used iterator". Modifying Array to support a new iterator type is opening up to continual churn to support others in the future.

I do think that modifying ArrayKeys etc to support some kind of "range" seems completely fine though and would be handy to a wide enough audience to include as a non-breaking change into the standard library.

If folks are generally in agreement with this approach, I can write an RFC that includes what classes would be augmented and with an exemplar implementation.

Jan 05 '23 13:01 SeanTAllen

I don't like the idea of "empty set iterator" for "you gave incorrect input". That's a nice way to hide a bug.

Personally, I'm in the camp of returning an empty iterator for such cases. Which would be consistent with how we already handle out-of-bound range-restricting methods.

All of the range-restricting methods I could find in builtin use the pattern of silently clipping the requested range into bounds:

Array.slice
Array.trim
Array.trim_in_place
String.substring
String.trim
String.trim_in_place
String.cut
String.cut_in_place
String.codepoints

To me it makes perfect sense for the following two expressions to have the same result:

my_array.slice(start, finish).values()
my_array.values(start, finish)

Jan 05 '23 18:01 jemc

I'm not personally inclined to change values without there being a compelling performance case given that you can do your first example easily enough. I doubt that people would be doing this in hot paths, so the existing way of using slice, seems fine to me. Given that and a general lack of agreement, I don't think I will move forward with writing an RFC for this.

Jan 05 '23 18:01 SeanTAllen

I doubt that people would be doing this in hot paths, so the existing way of using slice, seems fine to me.

I may be missing something obvious, but I'm curious: why would you doubt that people would use this mechanism in a hot path?

Jan 06 '23 18:01 jemc

If I was in a hot path, I wouldn't want to repeatedly create iterator objects, I'd index directly. Save repeated object creation in the hot path. And if I'm not in a hot path then the extra cost of slice then iterator it pretty trivial to me.

Jan 06 '23 18:01 SeanTAllen

One more issue here: As is, Pony does not treat default arguments as not present, when matching interfaces. This would cause the Array class to not be some sequence interfaces that require values()

Jan 10 '23 19:01 jasoncarr0

Good point - I think to move this forward we'd also want to update the other values iterators (and interfaces of them) as well.

Jan 10 '23 19:01 jemc

rfcs rfcs copied to clipboard

Add `from` and `to` parameters to Array* iterators

rfcs
rfcs copied to clipboard