bug
bug copied to clipboard
Performance gap of calling `toArray` on `Array[Any]` as `Seq[Any]` between using Scala 2.12 and Scala 2.13
Reproduction steps
Scala version: Scala 2.13.8
val valuesPerIteration: Long = 1000 * 1000 * 10
val arraySize = 1000
val arr: Seq[Any] = new Array[Any](arraySize)
var n = 0
while (n < valuesPerIteration) {
val array = arr.toArray
n += 1
}
Problem
When using Scala 2.12: 0.3 ns per row
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
toArray: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq 3 3 0 3203.9 0.3 1.0X
When using Scala 2.13: 285.3 ns per row
OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
toArray: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq 2853 3179 461 3.5 285.3 1.0X
When the arraySize is 1000, the above scenario is 951 times slower when using Scala 2.13 than when using Scala 2.12, and with the increase of 'arraySize', the performance gap will also increase.
When call toArray on Array[Anry] as Seq[Any]:
- Using Scala 2.12, the class type of
arrismutable.WrappedArray$ofRef, thetoAarrymethod did not trigger memory copy - Using Scala 2.13, the class type of
arrisimmutable.ArraySeq$ofRef, thetoAarrymethod trigger memory copy
So should we rewrite toAarry for immutable.ArraySeq and mutable.ArraySeq in Scala 2.13 ?
cc @SethTisue
@scala/collections ?
Offhand, this looks like a result of Seq meaning something different in Scala 2.12 and 2.13, as per the “Collections Redesign” section of https://github.com/scala/scala/releases/tag/v2.13.0 and as per the 2.13 collections migration guide, https://docs.scala-lang.org/overviews/core/collections-migration-213.html
~And thus, it's likely this is working (and performing) as expected, rather than being a bug.~ (but see discussion below about the collection.Seq case, which may be a bug)
In Scala 2.12, Seq means collection.Seq, which means it might be mutable or immutable.
In Scala 2.13, Seq means collection.immutable.Seq, which means it's definitely immutable.
Are you trying to write crossbuilt code that performs acceptably on both versions, or are you upgrading and just need good performance on 2.13?
Note that 2.13 offers scala.collection.immutable.ArraySeq.unsafeWrapArray, which allows you to pass off an array that you promise never to mutate as an immutable.Seq.
In Scala 2.12,
Seqmeanscollection.Seq, which means it might be mutable or immutable.In Scala 2.13,
Seqmeanscollection.immutable.Seq, which means it's definitely immutable.
I change the test code from val arr: Seq[Any] = new Array[Any](arraySize) to val arr: scala.collection.Seq[Any] = new Array[Any](arraySize) and re-run the test code, when using Scala 2.13, the class type of arr change to class scala.collection.mutable.ArraySeq$ofRef, but no change in performance compared with using Seq[Any].
Are you trying to write crossbuilt code that performs acceptably on both versions, or are you upgrading and just need good performance on 2.13?
Trying to write crossbuilt code that performs acceptably on both versions, the test scenario comes from Spark, and now it needs to support two Scala versions at the same time.
Note that 2.13 offers
scala.collection.immutable.ArraySeq.unsafeWrapArray, which allows you to pass off an array that you promise never to mutate as animmutable.Seq.
So for performance, I need to call immutable.ArraySeq.unsafeArray or mutable.ArraySeq.array instead of toArray in Scala 2.13 when the code need crossbuilt?
I'd like to note that your original code yields the following warning:
On line 3: warning: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): Implicit conversions from Array to immutable.IndexedSeq are implemented by copying; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
But changing Seq to collection.Seq makes the warning go away.
You've written:
When call toArray on Array[Anry] as Seq[Any]: Using Scala 2.12, the class type of arr is mutable.WrappedArray$ofRef, the toAarry method did not trigger memory copy
Note that we can demonstrate that without benchmarking as follows:
scala 2.12.16> val a = Array[Any](0); a eq ((a: collection.Seq[Any]).toArray)
a: Array[Any] = Array(0)
res1: Boolean = true
scala 2.13.8> val a = Array[Any](0); a eq ((a: collection.Seq[Any]).toArray)
val a: Array[Any] = Array(0)
val res2: Boolean = false
That's a behavior change. Does the @scala/collections crew think there's something we ought to (and can) change in the Scala 2.13 collections to make this be true again?
in particular, @LuciferYang has suggested:
should we rewrite
toArrayformutable.ArraySeqin Scala 2.13 ?
is that feasible?
Note that we can demonstrate the problem with mutable.ArraySeq without involving Seq or Any:
scala 2.13.8> val as = collection.mutable.ArraySeq(1); as.array eq as.toArray
val as: scala.collection.mutable.ArraySeq[Int] = ArraySeq(1)
val res7: Boolean = false
Note that I'm aware that I haven't answered your question about how to work around it when crossbuilding. It's a bit difficult to make suggestions since I don't know what constraints your original code is operating under — I don't know how drastically you're free to change it.
(Our main focus here in scala/bug should be on identifying and fixing bugs in Scala, but discussing workarounds is also valid.)
Note that I'm aware that I haven't answered your question about how to work around it when crossbuilding. It's a bit difficult to make suggestions since I don't know what constraints your original code is operating under — I don't know how drastically you're free to change it.
doesn't matter. Thanks a lot ~
Using Scala 2.12, the class type of
arrismutable.WrappedArray$ofRef, thetoArraymethod did not trigger memory copy
Honestly, if that's the case, it seems like a bug in 2.12. the toColl methods have always produced new collections if called on a mutable collection—see Buffer#toBuffer for example. It's certainly not a bug in 2.13 for it to create a copy of the underlying Array.
Using Scala 2.12, the class type of
arrismutable.WrappedArray$ofRef, thetoArraymethod did not trigger memory copyHonestly, if that's the case, it seems like a bug in 2.12. the
toCollmethods have always produced new collections if called on a mutable collection—seeBuffer#toBufferfor example. It's certainly not a bug in 2.13 for it to create a copy of the underlyingArray.
So we should fix Scala 2.12 instead of Scala 2.13?
So we should fix Scala 2.12 instead of Scala 2.13?
I don't know. there is an understandable reluctance to change longstanding behaviour, especially if it can be argued that some users desire it. it probably depends if there's any documentation of the method that explicitly says "copy" or "new"