rfcs
rfcs copied to clipboard
Add `from` and `to` parameters to Array* iterators
The correct way in Pony to iterate over items seems to be through the for
control structure which uses the Iterator
interface. However if I want to iterate over a subset, I have to use Range
from collections
and use that to index into the array.
Changing .keys()
, .values()
and .pairs()
to take (from: USize, to: USize)
would make this a lot easier. The change seems quite small, but where I'm unsure is how to handle from > .size()
and to < from
. Some might say that these should cause an error
(making these functions partial) and some might argue that they should cause an empty set / iterator.
I propose that the respective Iterator
classes should change their default constructor to:
new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
_array = array
_i = from
_end = to.min(array.size())
Another issue is arrays that somehow become shorter than the iterator, while it is in use, since the signature of the array iterators are eg. ArrayKeys[A, B: Array[A] #read]
, where only #read
is required.
Oh yeah, I'd be happy to write the PR to ponyc and the RFC too
This makes sense in general to me - we could talk about the details in the RFC.
Note that we could apply this convention not just to Array
but also other random-access sequential collections.
Returning an "empty" iterator would also be very simple. If from
is larger than to
the internal _i
counter would start out larger than _end
in my example and we could do the following:
fun has_next(): Bool =>
_i < _end
Then there wouldn't be any breaking changes from iterator constructor becoming a partial function. But then the constructor would probably have _end = to.min(array.size())
, although the clamping would be "silent".
What about an array becoming smaller than the iterator while it is being looped over?
Currently the only example of a linked-list style iterator, for a class that has Seq
in it's inheritance (?) tree is List
, so we'd need to figure out if Seq
and ReadSeq
should demand fun values()
or that should be another trait.
Here's a diff of what I've had to change for the proposed changes to compile:
diff --git a/packages/builtin/array.pony b/packages/builtin/array.pony
index 4026e2e8f..13f78fcd2 100644
--- a/packages/builtin/array.pony
+++ b/packages/builtin/array.pony
@@ -528,34 +528,36 @@ class Array[A] is Seq[A]
end
end
- fun keys(): ArrayKeys[A, this->Array[A]]^ =>
+ fun keys(from: USize = 0, to: USize = USize.max_value()): ArrayKeys[A, this->Array[A]]^ =>
"""
Return an iterator over the indices in the array.
"""
- ArrayKeys[A, this->Array[A]](this)
+ ArrayKeys[A, this->Array[A]](this, from, to)
- fun values(): ArrayValues[A, this->Array[A]]^ =>
+ fun values(from: USize = 0, to: USize = USize.max_value()): ArrayValues[A, this->Array[A]]^ =>
"""
Return an iterator over the values in the array.
"""
- ArrayValues[A, this->Array[A]](this)
+ ArrayValues[A, this->Array[A]](this, from, to)
- fun pairs(): ArrayPairs[A, this->Array[A]]^ =>
+ fun pairs(from: USize = 0, to: USize = USize.max_value()): ArrayPairs[A, this->Array[A]]^ =>
"""
Return an iterator over the (index, value) pairs in the array.
"""
- ArrayPairs[A, this->Array[A]](this)
+ ArrayPairs[A, this->Array[A]](this, from, to)
class ArrayKeys[A, B: Array[A] #read] is Iterator[USize]
let _array: B
var _i: USize
+ let _end: USize
- new create(array: B) =>
+ new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
_array = array
- _i = 0
+ _i = from
+ _end = to.min(array.size())
fun has_next(): Bool =>
- _i < _array.size()
+ _i < _end
fun ref next(): USize =>
if _i < _array.size() then
@@ -567,13 +569,15 @@ class ArrayKeys[A, B: Array[A] #read] is Iterator[USize]
class ArrayValues[A, B: Array[A] #read] is Iterator[B->A]
let _array: B
var _i: USize
+ let _end: USize
- new create(array: B) =>
+ new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
_array = array
- _i = 0
+ _i = from
+ _end = to.min(array.size())
fun has_next(): Bool =>
- _i < _array.size()
+ _i < _end
fun ref next(): B->A ? =>
_array(_i = _i + 1)?
@@ -585,13 +589,15 @@ class ArrayValues[A, B: Array[A] #read] is Iterator[B->A]
class ArrayPairs[A, B: Array[A] #read] is Iterator[(USize, B->A)]
let _array: B
var _i: USize
+ let _end: USize
- new create(array: B) =>
+ new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
_array = array
- _i = 0
+ _i = from
+ _end = to.min(array.size())
fun has_next(): Bool =>
- _i < _array.size()
+ _i < _end
fun ref next(): (USize, B->A) ? =>
(_i, _array(_i = _i + 1)?)
diff --git a/packages/builtin/read_seq.pony b/packages/builtin/read_seq.pony
index c273fab65..8e7b03f95 100644
--- a/packages/builtin/read_seq.pony
+++ b/packages/builtin/read_seq.pony
@@ -13,7 +13,7 @@ interface box ReadSeq[A]
is out of bounds. Note that this returns A^, not this->A.
"""
- fun values(): Iterator[this->A]^
+ fun values(from: USize = 0, to: USize = USize.max_value()): Iterator[this->A]^
"""
Returns an iterator over the elements of the sequence. Note that this
iterates over A^, not this->A.
diff --git a/packages/builtin/seq.pony b/packages/builtin/seq.pony
index 233e7d5a3..c2bb821bc 100644
--- a/packages/builtin/seq.pony
+++ b/packages/builtin/seq.pony
@@ -75,7 +75,7 @@ interface Seq[A]
If the sequence is already smaller than len, do nothing.
"""
- fun values(): Iterator[this->A]^
+ fun values(from: USize = 0, to: USize = USize.max_value()): Iterator[this->A]^
"""
Returns an iterator over the elements of the sequence.
"""
diff --git a/packages/builtin/string.pony b/packages/builtin/string.pony
index 10dde5eaf..35cae2d43 100644
--- a/packages/builtin/string.pony
+++ b/packages/builtin/string.pony
@@ -1438,17 +1438,17 @@ actor Main
fun string(): String iso^ =>
clone()
- fun values(): StringBytes^ =>
+ fun values(from: USize = 0, to: USize = USize.max_value()): StringBytes^ =>
"""
Return an iterator over the bytes in the string.
"""
- StringBytes(this)
+ StringBytes(this, from, to)
- fun runes(): StringRunes^ =>
+ fun runes(from: USize = 0, to: USize = USize.max_value()): StringRunes^ =>
"""
Return an iterator over the codepoints in the string.
"""
- StringRunes(this)
+ StringRunes(this, from, to)
fun ref _set(i: USize, value: U8): U8 =>
"""
@@ -1459,13 +1459,15 @@ actor Main
class StringBytes is Iterator[U8]
let _string: String box
var _i: USize
+ let _end: USize
- new create(string: String box) =>
+ new create(string: String box, from: USize = 0, to: USize = USize.max_value()) =>
_string = string
- _i = 0
+ _i = from
+ _end = to.min(string.size())
fun has_next(): Bool =>
- _i < _string.size()
+ _i < _end
fun ref next(): U8 ? =>
_string(_i = _i + 1)?
@@ -1473,13 +1475,15 @@ class StringBytes is Iterator[U8]
class StringRunes is Iterator[U32]
let _string: String box
var _i: USize
+ let _end: USize
- new create(string: String box) =>
+ new create(string: String box, from: USize = 0, to: USize = USize.max_value()) =>
_string = string
- _i = 0
+ _i = from
+ _end = to.min(string.size())
fun has_next(): Bool =>
- _i < _string.size()
+ _i < _end
fun ref next(): U32 ? =>
(let rune, let len) = _string.utf32(_i.isize())?
diff --git a/packages/collections/list.pony b/packages/collections/list.pony
index 1672bfc72..ed2903a31 100644
--- a/packages/collections/list.pony
+++ b/packages/collections/list.pony
@@ -530,7 +530,7 @@ class List[A] is Seq[A]
"""
ListNodes[A, this->ListNode[A]](_head, true)
- fun values(): ListValues[A, this->ListNode[A]]^ =>
+ fun values(a: USize = 0, b: USize = USize.max_value()): ListValues[A, this->ListNode[A]]^ =>
"""
Return an iterator on the values in the list.
"""
What about an array becoming smaller than the iterator while it is being looped over?
IIRC we already don't have any behavioural guarantees for mutating a collection that you're iterating over, so I think the answer that "behaviour in such a case is undefined" is the best we can say.
@emilbayes Regarding the question of whether List
(and other linked-lists) should accept from
and to
parameters in the values
method, I'd say that because they already support indexed random-access (via the apply
and update
methods), adding from
and to
constraints to their iterators should be fair game.
Also, I notice that your diff is missing the collections/persistent
subpackage - you probably want to spruce those up as well.
Thanks for the input @jemc
Re: List: Should the "price" of forwarding to from
be paid in the iterator constructor? Is that intuitive? (ofc it should be documented)
Re: collections/persistent
: Will do! I guess these didn't show up as they're not part of any of the traits I changed. They also look more tricky at first glance (I haven't used them before)
Re: List: Should the "price" of forwarding to from be paid in the iterator constructor? Is that intuitive? (ofc it should be documented)
Seems intuitive to me, but maybe I'm not understanding you correctly.
I just want it to be unsurprising that a large from
with List
will incur some kind of cost, unlike with Array
:)
Makes sense - I think documentation is enough for that. As I said above, if it's already allowing random access via apply
, it's not like you're adding something new in terms of jump cost, as I see it.
@emilbayes are you still interested in writing an RFC for this?
@jemc are you still in favor of this, if yes, I can take on writing an RFC as it seems relatively straightforward to write.
Yes, I'm still in favor of it.
Some thoughts/discussion points...
If we have some functions (existing or new) that has from
and to
parameters that result in "can't compute" it would make sense to have the function(s) be partial. I don't like the idea of "empty set iterator" for "you gave incorrect input". That's a nice way to hide a bug.
- I don't want to make the existing methods partial. Ewwww...
- I don't much like the idea of adding additional methods for "a range".
Perhaps... ArrayKeys
et al are modified and keys, values, etc methods are left as is. Something like this...
new free_candy(array: B, from: Usize, to: USize) ? =>
...
new free_candy_from(x: ArrayKeys[A, B], from: USize, to: USize) ? =>
Where you can either.. create an ArrayKeys directly that is constrained to a subset of the Array or you can create from an existing ArrayKeys.
Thoughts Joe? Technically, people can do these on their own without it being incorporated into the standard library. There's nothing to stop someone from creating these iterators on their own. The only issue is "is there a method to get directly from Array et al". I'm quite ok with Array et al have keys, values etc only return "most commonly used iterator". Modifying Array to support a new iterator type is opening up to continual churn to support others in the future.
I do think that modifying ArrayKeys etc to support some kind of "range" seems completely fine though and would be handy to a wide enough audience to include as a non-breaking change into the standard library.
If folks are generally in agreement with this approach, I can write an RFC that includes what classes would be augmented and with an exemplar implementation.
I don't like the idea of "empty set iterator" for "you gave incorrect input". That's a nice way to hide a bug.
Personally, I'm in the camp of returning an empty iterator for such cases. Which would be consistent with how we already handle out-of-bound range-restricting methods.
All of the range-restricting methods I could find in builtin
use the pattern of silently clipping the requested range into bounds:
-
Array.slice
-
Array.trim
-
Array.trim_in_place
-
String.substring
-
String.trim
-
String.trim_in_place
-
String.cut
-
String.cut_in_place
-
String.codepoints
To me it makes perfect sense for the following two expressions to have the same result:
my_array.slice(start, finish).values()
my_array.values(start, finish)
I'm not personally inclined to change values
without there being a compelling performance case given that you can do your first example easily enough. I doubt that people would be doing this in hot paths, so the existing way of using slice, seems fine to me. Given that and a general lack of agreement, I don't think I will move forward with writing an RFC for this.
I doubt that people would be doing this in hot paths, so the existing way of using slice, seems fine to me.
I may be missing something obvious, but I'm curious: why would you doubt that people would use this mechanism in a hot path?
If I was in a hot path, I wouldn't want to repeatedly create iterator objects, I'd index directly. Save repeated object creation in the hot path. And if I'm not in a hot path then the extra cost of slice then iterator it pretty trivial to me.
One more issue here: As is, Pony does not treat default arguments as not present, when matching interfaces. This would cause the Array class to not be some sequence interfaces that require values()
Good point - I think to move this forward we'd also want to update the other values
iterators (and interfaces of them) as well.