bug
bug copied to clipboard
Performance gap of calling `toArray` on `Array[Any]` as `Seq[Any]` between using Scala 2.12 and Scala 2.13
Reproduction steps
Scala version: Scala 2.13.8
val valuesPerIteration: Long = 1000 * 1000 * 10
val arraySize = 1000
val arr: Seq[Any] = new Array[Any](arraySize)
var n = 0
while (n < valuesPerIteration) {
val array = arr.toArray
n += 1
}
Problem
When using Scala 2.12: 0.3 ns per row
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
toArray: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq 3 3 0 3203.9 0.3 1.0X
When using Scala 2.13: 285.3 ns per row
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
toArray: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq 2853 3179 461 3.5 285.3 1.0X
When the arraySize
is 1000, the above scenario is 951 times slower when using Scala 2.13 than when using Scala 2.12, and with the increase of 'arraySize', the performance gap will also increase.
When call toArray
on Array[Anry] as Seq[Any]
:
- Using Scala 2.12, the class type of
arr
ismutable.WrappedArray$ofRef
, thetoAarry
method did not trigger memory copy - Using Scala 2.13, the class type of
arr
isimmutable.ArraySeq$ofRef
, thetoAarry
method trigger memory copy
So should we rewrite toAarry
for immutable.ArraySeq
and mutable.ArraySeq
in Scala 2.13 ?
cc @SethTisue
@scala/collections ?
Offhand, this looks like a result of Seq
meaning something different in Scala 2.12 and 2.13, as per the “Collections Redesign” section of https://github.com/scala/scala/releases/tag/v2.13.0 and as per the 2.13 collections migration guide, https://docs.scala-lang.org/overviews/core/collections-migration-213.html
~And thus, it's likely this is working (and performing) as expected, rather than being a bug.~ (but see discussion below about the collection.Seq
case, which may be a bug)
In Scala 2.12, Seq
means collection.Seq
, which means it might be mutable or immutable.
In Scala 2.13, Seq
means collection.immutable.Seq
, which means it's definitely immutable.
Are you trying to write crossbuilt code that performs acceptably on both versions, or are you upgrading and just need good performance on 2.13?
Note that 2.13 offers scala.collection.immutable.ArraySeq.unsafeWrapArray
, which allows you to pass off an array that you promise never to mutate as an immutable.Seq
.
In Scala 2.12,
Seq
meanscollection.Seq
, which means it might be mutable or immutable.In Scala 2.13,
Seq
meanscollection.immutable.Seq
, which means it's definitely immutable.
I change the test code from val arr: Seq[Any] = new Array[Any](arraySize)
to val arr: scala.collection.Seq[Any] = new Array[Any](arraySize)
and re-run the test code, when using Scala 2.13, the class type of arr
change to class scala.collection.mutable.ArraySeq$ofRef
, but no change in performance compared with using Seq[Any]
.
Are you trying to write crossbuilt code that performs acceptably on both versions, or are you upgrading and just need good performance on 2.13?
Trying to write crossbuilt code that performs acceptably on both versions, the test scenario comes from Spark, and now it needs to support two Scala versions at the same time.
Note that 2.13 offers
scala.collection.immutable.ArraySeq.unsafeWrapArray
, which allows you to pass off an array that you promise never to mutate as animmutable.Seq
.
So for performance, I need to call immutable.ArraySeq.unsafeArray
or mutable.ArraySeq.array
instead of toArray
in Scala 2.13 when the code need crossbuilt?
I'd like to note that your original code yields the following warning:
On line 3: warning: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): Implicit conversions from Array to immutable.IndexedSeq are implemented by copying; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
But changing Seq
to collection.Seq
makes the warning go away.
You've written:
When call toArray on Array[Anry] as Seq[Any]: Using Scala 2.12, the class type of arr is mutable.WrappedArray$ofRef, the toAarry method did not trigger memory copy
Note that we can demonstrate that without benchmarking as follows:
scala 2.12.16> val a = Array[Any](0); a eq ((a: collection.Seq[Any]).toArray)
a: Array[Any] = Array(0)
res1: Boolean = true
scala 2.13.8> val a = Array[Any](0); a eq ((a: collection.Seq[Any]).toArray)
val a: Array[Any] = Array(0)
val res2: Boolean = false
That's a behavior change. Does the @scala/collections crew think there's something we ought to (and can) change in the Scala 2.13 collections to make this be true
again?
in particular, @LuciferYang has suggested:
should we rewrite
toArray
formutable.ArraySeq
in Scala 2.13 ?
is that feasible?
Note that we can demonstrate the problem with mutable.ArraySeq
without involving Seq
or Any
:
scala 2.13.8> val as = collection.mutable.ArraySeq(1); as.array eq as.toArray
val as: scala.collection.mutable.ArraySeq[Int] = ArraySeq(1)
val res7: Boolean = false
Note that I'm aware that I haven't answered your question about how to work around it when crossbuilding. It's a bit difficult to make suggestions since I don't know what constraints your original code is operating under — I don't know how drastically you're free to change it.
(Our main focus here in scala/bug should be on identifying and fixing bugs in Scala, but discussing workarounds is also valid.)
Note that I'm aware that I haven't answered your question about how to work around it when crossbuilding. It's a bit difficult to make suggestions since I don't know what constraints your original code is operating under — I don't know how drastically you're free to change it.
doesn't matter. Thanks a lot ~
Using Scala 2.12, the class type of
arr
ismutable.WrappedArray$ofRef
, thetoArray
method did not trigger memory copy
Honestly, if that's the case, it seems like a bug in 2.12. the toColl
methods have always produced new collections if called on a mutable collection—see Buffer#toBuffer
for example. It's certainly not a bug in 2.13 for it to create a copy of the underlying Array
.
Using Scala 2.12, the class type of
arr
ismutable.WrappedArray$ofRef
, thetoArray
method did not trigger memory copyHonestly, if that's the case, it seems like a bug in 2.12. the
toColl
methods have always produced new collections if called on a mutable collection—seeBuffer#toBuffer
for example. It's certainly not a bug in 2.13 for it to create a copy of the underlyingArray
.
So we should fix Scala 2.12 instead of Scala 2.13?
So we should fix Scala 2.12 instead of Scala 2.13?
I don't know. there is an understandable reluctance to change longstanding behaviour, especially if it can be argued that some users desire it. it probably depends if there's any documentation of the method that explicitly says "copy" or "new"