SIP-70 - Flexible Varargs
Do you want to cite https://github.com/scala/improvement-proposals/pull/41 ? There are a lot of previous discussions on this feature.
This SIP is a renamed version of https://github.com/scala/improvement-proposals/pull/41 , and also adds an intermediate step to create a temporary seq.
applyBegin->newBuilderapplyNext->addOneapplyNextSeq->addAllapplyEnd->result
I started to wander if this SIP could help resolve this issue: https://github.com/scala/scala3/issues/18009
In short, the problem there is that an invocation of a curried method added by a refinement on a subtype of Selectable, e.g.
mySelectable.foo(args1*)(args2*)
should get desugared to something like
mySelectable.applyDynamic("foo")(args1*, args2*).asInstanceOf[Foo]
which is now illegal, but would be when this SIP gets implemented
@odersky @sjrd @lrytz have any of you had a chance to look at this? It's been a month since reviewers were assigned so I'm hoping to get some feedback
I also think this SIP is a very nice improvement.
You mention / suggest to use IArray, which is an opaque type alias and not part of the collection hierarchy. sum(IArray.newBuilder[Int].addOne(1).result()*) wraps the resulting array using IArray.wrapIntArray, which converts it to an immutable.ArraySeq. So it's probably better to use ArraySeq directly. Or maybe Vector?
ArraySeq: can wrap a primitive array, minimal memory overheadVector: better if the collections is later used for appending / prepending, concatenation etc. Memory and computational overhead is small. Primitives are boxed.
I'm not sure if the newBuilder call needs an explicit type argument, or if we can have that inferred by the typer. In Scala 3:
scala> collection.immutable.ArraySeq.newBuilder.addOne(1).result()
-- [E172] Type Error: ----------------------------------------------------------
1 |collection.immutable.ArraySeq.newBuilder.addOne(1).result()
| ^
| No ClassTag available for Any
1 error found
Interestingly, the same line works on Scala 2, the compiler infers newBuilder[Int]. Is that a known limitation @odersky?
If an explicit type argument is needed, the SIP should specify how that type is obtained.
For patterns, we'll need to update the spec to say how case E(p1, ..., pN, s*, u1, ..., uM) translates. I think the section is https://scala-lang.org/files/archive/spec/3.4/08-pattern-matching.html#pattern-sequences, the spec seems not to be fully up to date though, it doesn't reflect https://github.com/scala/scala3/pull/11240.
This SIP doesn't change repeated parameters (of case classes), a repeated parameter will still be last in a case classs definition. An unapplySeq declaration should also continue to have the corresponding return type (T1, ..., Tm, Seq[S]).
So the spec needs to map the new pattern shape onto to current case class parameter list shape (or unapplySeq return type shape). Note that T1, ..., Tm can be arbitrary types, all different from each other and different from S. So I'm not sure the proposal of translating case E(p1, ..., pN, s*, u1, ..., uM) to case VarargsMatcher(Seq(p1, ..., pN), s, Seq(u1, ..., uM)) would work.
You mention / suggest to use
IArray, which is an opaque type alias and not part of the collection hierarchy.sum(IArray.newBuilder[Int].addOne(1).result()*)wraps the resulting array usingIArray.wrapIntArray, which converts it to animmutable.ArraySeq. So it's probably better to useArraySeqdirectly. Or maybeVector?
The motivation for IArray was to try and decouple the language feature from the concrete implementation of the Scala collections library: IArray seems like a relatively simple collection with relatively good performance characteristics. In particular, I wanted to:
- Avoid using
Seqwhich defaults toListwhich is from the start a lot less cache-friendly thanIArray. Vectorhas the downside of being a pretty specific collection in the standard library, with a very-non-trivial implementation, and AFAIK is something the Scala language was not aware of up to this point.
This SIP doesn't change repeated parameters (of case classes), a repeated parameter will still be last in a case classs definition.
Yes, definitions are unchanged. Not just for case classes, but for method defs as well, they all can only have one vararg* parameter at the end
An
unapplySeqdeclaration should also continue to have the corresponding return type(T1, ..., Tm, Seq[S]).
Yes that is right. The trailing Seq[S] can be broken up into a smaller Seq in the middle and loose elements before/after it during destructuring, but these elements all continue to be homogeneously typed as S, and the signature of the unapplySeq does not need to change
Interestingly, the same line works on Scala 2, the compiler infers newBuilder[Int]. Is that a known limitation @odersky?
No, I did not know that.
Generally, I think it's better of the SIP does not specify a precise implementation scheme. The scheme might change in the future, and we should not have to change the SIP because of that. The IArray code can be considered for illustration, of course.
In terms of what we have, it looks like basing the whole thing on ArrayBuilder might work.
IArrayseems like a relatively simple collection with relatively good performance characteristics
It's ArraySeq that ends up being used – which seems a practical choice to me. Mentioning IArray is just a bit confusing, it requires the presence of an implicit conversion.
The spec says when using a sequence argument xs*, the repeated parameter is considered to be of type scala.Seq[S]. I agree using the default scala.Seq factory (List) would not be great. Maybe the SIP/spec can leave it as an implementation detail; f(x, xs*, y) is translated to Seq.newBuilder.addOne(x).addAll(xs).addOne(y).result(), but a compiler may use a different factory such that the result conforms to scala.Seq[S].
The trailing
Seq[S]can be broken up into a smallerSeqin the middle and loose elements before/after it during destructuring, but these elements all continue to be homogeneously typed asS, and the signature of theunapplySeqdoes not need to change
Right. I was thinking what happens when using subpatterns and then got a bit lost in spec. The VarargsMatchHelper seems to work fine for that.
def f(x: Any): String = x match {
case Seq(Some(_), y: String, xs*, 1) => y
}
// translates to
def f(x: Any): String = {
val VarargsMatcher = new VarargsMatchHelper(2, 1)
x match {
case VarargsMatcher(Seq(Some(_), y: String), xs, Seq(1)) => y
}
}
There is now an implementation of most of the proposal.
One doubt I am having is spreads of Option. We do not allow an option to be passed individually to a vararg parameter. I.e. the following would not work:
def foo(x: Int*) = x.sum
val xo = Some(2)
f(xo*)
That's an error since the argument of a spread operator must be a Seq or an Array. Do we really want to change that? That would affect normal spreads not just spreads embedded in a longer list. There might be use cases for embedded options, but for a single spread passed to a vararg it feels a bit weird. Alternatively, we can demand an explicit toList, i.e.
f(xo.toList*)
would work today. I'd be interested in getting the committee member's opinion on this aspect before I try to implement it.
My motivation for including a special case for Option is that empirically it's super common to have "optional elements" in a sequence or sequence literal. Libraries like ScalaTags' Frag, OS-Lib's Shellable all allow this because it's such a common need. Other Iterables like Set or Map are not nearly as commonly spliced due to their nondeterministic ordering, a problem that Options don't have.
The use of a naked foo(opt*) is certainly a bit unusual, but it doesn't seem wrong. It doesn't seem worth it to make users jump through hoops in other scenarios by adding .toList everywhere just to prohibit the foo(opt*) scenario which is actually fine, and nobody really cares about either way anyway.
A third option is to allow splicing options in longer sequences such as foo(opt1*, opt2*), but prohibiting it the alone e.g. foo(opt*). Can be done, but again it seems like we're just adding irregularity just for the sake of it without any real benefit.
So IMO might as well allow foo(opt*) for the sake of (a) regularity and (b) convenience in other cases where it actually does matter
Perhaps a fourth alternative is that we add a new marker trait trait VarargUnpackable[T] that Option and Seq extend, and make foo(bar*) work on any VarargUnpackable. VarargUnpackable could have whatever abstract methods are necessary to efficiently participate in the flexible varargs desugaring. e.g. one possible definition is
trait VarargUnpackable[+T]{
// Used when unpacking a single value, can
// efficiently return `this` for `Seq`s
def toSeq: Seq[T]
// Used when unpacking flexible varargs to
// efficiently feed the elements of this object
// into a `Seq.builder`
def copyToArray(dest: Array[T], startIndex: Int): Unit
// Used when unpacking flexible varargs to allow
// the builder to expand its capacity precisely
// when the size of this is known. Otherwise
// returns `-1` indicating the size it not known
// and the builder would have to handle expansion
// using its own heuristics
def sizeHint: Int
// When sizeHint is -1, we cannot pre-size the
// builder array, and so we append elements one
// at a time to give the builder a chance to resize
// as necessary
def foreach(f: T => Unit): Unit
}
object VarargUnpackable{
class Builder[T]{
def addAll(v: VarargUnpackable[T])
def add(v: T)
def result(): Seq[T]
}
}
// Desugaring
foo(bar*)
foo(VarargUnpackable.Builder().addAll(bar).build())
foo(bar*, qux*)
foo(VarargUnpackable.Builder().addAll(bar).addAll(qux).build())
foo(value1, bar*, value2, qux*, value3)
foo(VarargUnpackable.Builder().add(value1).addAll(bar).add(value2).addAll(qux).add(value3).build())
The VarargUnpackable trait above should be the minimal API necessary to efficiently build the Seq[T] we need to provide to the body of a vararg-taking method. This included allowing use of high-performance batch copy methods like System.arraycopy, for implementers of VarargUnpackable that can support them (e.g. ArraySeq, ArrayBuffer, ArrayDeque) while other collections can implement copyToArray in a more naive foreach/iterator-based fashion. The builder desugaring above is similar to the builder approach taken in Java's collection literals discussions. We could simplify VarargUnpackable's members to just def foreach if we didn't care about performance, but it may be worth the slight additional complexity to ensure we're doing things efficiently
That would avoid hardcoding Option and Seq, and have the upside that user-specified types will be able to hook into this syntax without having to extend Seq[T] themselves which we all know is very error prone and discouraged.
A fifth alternative would be to use a name-based desugaring so foo(bar*, qux*) calls methods on bar and qux without any requirement of them being a particular type. Similar to the trait-based approach above, but perhaps a bit more flexible.
Why not just let Option and Seq extend a newly introduced trait OrderedIterable[+A] extends Iterable[A] if the reason to exclude Set is ordering.
@Atry I think it would be good to have the language feature depend on as minimal an interface as possible, so a minimal VarargUnpackable[+T] trait would be preferable since it doesn't extend Iterable and force implementers to support the rich API of the Iterable trait
I think all this goes to far. Varargs are Seq's. I see no reason to complicate things further here. Arrays can be passed to Seq's since there is an implicit conversion. The point of SIP 70 is to stay within this framework and allow multiple spreads. My PR provides that. Options unfortunately are an irregular case since there is no conversion from option to Seq (and for good reason!). I think it is much cleaner to keep that principle even if some use cases require an explicit .toList.
Here's another way to put it: I can have a vararg function and a spread argument to it:
def f(x: Int*)
f(xs*)
I can convert the vararg to a Seq and drop that the spread:
def f(x: Seq[Int])
f(xs)
That works, always. I have done that refactoring many times. But allowing options in spreads breaks it.
Generally varargs are tricky to get right since they need to be treated early during type inference when types are not yet known completely. That compounds complexity. So we need to keep it simple.
I agree with @odersky. I don't question that there are use cases for spreading Options (I can see a few spots in my code where I would definitely use it). The irregularity is a bit too much in this case, though. Having to add a .toList in order to be able to spread it seems like an acceptable tradeoff in order to retain the relative simplicity of varargs.
I think it would indeed be interesting if we had a minimal interface for varargs and spreads instead of the Seq we have now. That would go in the general direction to minimize the language mandated footprint of the standard library. But it would raise a lot of issues, including how to organize migration and how to keep backwards compatibility. So I think this would have to be a different SIP.
As a long-term Scala user, I strongly support @lihaoyi's arguments. Optiontype is so essential to everyday Scala programmer life ergonomics that I would love to see it accepted friction-free whenever possible, not less.
Maybe we can add .cc as a method that turns a non-collection type into a collection. name.cc* doesn't seem too bad. Even x.cc* doesn't seem too bad to me. x.toList* is a lot more effort.
If we allowed a spread type then we could use a unary * suffix operator to simply allow x.* while the ordinary collection would be x*. As it is, the vararg type isn't first-class, though, so you can't use it this way.
https://github.com/scala/scala3/pull/23855 now also implements the pattern matching specified in the SIP. This means the implementation is complete, except for the contentious point whether we want to support Option. (I still think we should not).
We can probably merge the Seq-only implementation first and follow up with the Option handling later: follow up discussion in the SIP meeting, follow up PR, or follow up SIP entirely. No need to block the meat of this proposal on an edge case when the bulk of it is uncontroversial
Let's discuss this point next week in the SIP meeting.
@Ichoran wrote:
Maybe we can add
.ccas a method that turns a non-collection type into a collection.name.cc*doesn't seem too bad. Evenx.cc*doesn't seem too bad to me.x.toList*is a lot more effort.
We already have Some(42).toSeq, which is not that bad, I think...
On the September 26 meeting we agreed to remove Option spreads from this SIP proposal, that is still pending.
The implementation (https://github.com/scala/scala3/pull/23855, available as experimental in 3.8) does not have Option spreads.
@lrytz I updated the doc to move the Option handling under ## Limitations since we no longer support it
Please note that the SIP template has changed:
- The SIP number goes into a
number: NNYAML header, and titles should not be prefixed withSIP NN - - The file name should be prefixed with the SIP number, padded to three digits.
Thanks!