bug icon indicating copy to clipboard operation
bug copied to clipboard

Performance gap of calling `toArray` on `Array[Any]` as `Seq[Any]` between using Scala 2.12 and Scala 2.13

Open LuciferYang opened this issue 2 years ago • 13 comments

Reproduction steps

Scala version: Scala 2.13.8

val valuesPerIteration: Long = 1000 * 1000 * 10
val arraySize = 1000
val arr: Seq[Any] = new Array[Any](arraySize)
var n = 0
while (n < valuesPerIteration) {
    val array = arr.toArray
    n += 1
}

Problem

When using Scala 2.12: 0.3 ns per row

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
toArray:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq                                       3              3           0       3203.9           0.3       1.0X

When using Scala 2.13: 285.3 ns per row

OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 11.4
Apple M1
toArray:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAnyAsSeq                                    2853           3179         461          3.5         285.3       1.0X

When the arraySize is 1000, the above scenario is 951 times slower when using Scala 2.13 than when using Scala 2.12, and with the increase of 'arraySize', the performance gap will also increase.

LuciferYang avatar Jul 14 '22 03:07 LuciferYang

When call toArray on Array[Anry] as Seq[Any]:

  • Using Scala 2.12, the class type of arr is mutable.WrappedArray$ofRef, the toAarry method did not trigger memory copy
  • Using Scala 2.13, the class type of arr is immutable.ArraySeq$ofRef, the toAarry method trigger memory copy

So should we rewrite toAarry for immutable.ArraySeq and mutable.ArraySeq in Scala 2.13 ?

cc @SethTisue

LuciferYang avatar Jul 14 '22 03:07 LuciferYang

@scala/collections ?

SethTisue avatar Jul 14 '22 03:07 SethTisue

Offhand, this looks like a result of Seq meaning something different in Scala 2.12 and 2.13, as per the “Collections Redesign” section of https://github.com/scala/scala/releases/tag/v2.13.0 and as per the 2.13 collections migration guide, https://docs.scala-lang.org/overviews/core/collections-migration-213.html

~And thus, it's likely this is working (and performing) as expected, rather than being a bug.~ (but see discussion below about the collection.Seq case, which may be a bug)

In Scala 2.12, Seq means collection.Seq, which means it might be mutable or immutable.

In Scala 2.13, Seq means collection.immutable.Seq, which means it's definitely immutable.

Are you trying to write crossbuilt code that performs acceptably on both versions, or are you upgrading and just need good performance on 2.13?

Note that 2.13 offers scala.collection.immutable.ArraySeq.unsafeWrapArray, which allows you to pass off an array that you promise never to mutate as an immutable.Seq.

SethTisue avatar Jul 14 '22 03:07 SethTisue

In Scala 2.12, Seq means collection.Seq, which means it might be mutable or immutable.

In Scala 2.13, Seq means collection.immutable.Seq, which means it's definitely immutable.

I change the test code from val arr: Seq[Any] = new Array[Any](arraySize) to val arr: scala.collection.Seq[Any] = new Array[Any](arraySize) and re-run the test code, when using Scala 2.13, the class type of arr change to class scala.collection.mutable.ArraySeq$ofRef, but no change in performance compared with using Seq[Any].

LuciferYang avatar Jul 14 '22 04:07 LuciferYang

Are you trying to write crossbuilt code that performs acceptably on both versions, or are you upgrading and just need good performance on 2.13?

Trying to write crossbuilt code that performs acceptably on both versions, the test scenario comes from Spark, and now it needs to support two Scala versions at the same time.

LuciferYang avatar Jul 14 '22 04:07 LuciferYang

Note that 2.13 offers scala.collection.immutable.ArraySeq.unsafeWrapArray, which allows you to pass off an array that you promise never to mutate as an immutable.Seq.

So for performance, I need to call immutable.ArraySeq.unsafeArray or mutable.ArraySeq.array instead of toArray in Scala 2.13 when the code need crossbuilt?

LuciferYang avatar Jul 14 '22 05:07 LuciferYang

I'd like to note that your original code yields the following warning:

On line 3: warning: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): Implicit conversions from Array to immutable.IndexedSeq are implemented by copying; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call

But changing Seq to collection.Seq makes the warning go away.

You've written:

When call toArray on Array[Anry] as Seq[Any]: Using Scala 2.12, the class type of arr is mutable.WrappedArray$ofRef, the toAarry method did not trigger memory copy

Note that we can demonstrate that without benchmarking as follows:

scala 2.12.16> val a = Array[Any](0); a eq ((a: collection.Seq[Any]).toArray)
a: Array[Any] = Array(0)
res1: Boolean = true
scala 2.13.8> val a = Array[Any](0); a eq ((a: collection.Seq[Any]).toArray)
val a: Array[Any] = Array(0)
val res2: Boolean = false

That's a behavior change. Does the @scala/collections crew think there's something we ought to (and can) change in the Scala 2.13 collections to make this be true again?

SethTisue avatar Jul 14 '22 13:07 SethTisue

in particular, @LuciferYang has suggested:

should we rewrite toArray for mutable.ArraySeq in Scala 2.13 ?

is that feasible?

Note that we can demonstrate the problem with mutable.ArraySeq without involving Seq or Any:

scala 2.13.8> val as = collection.mutable.ArraySeq(1); as.array eq as.toArray
val as: scala.collection.mutable.ArraySeq[Int] = ArraySeq(1)
val res7: Boolean = false

SethTisue avatar Jul 14 '22 14:07 SethTisue

Note that I'm aware that I haven't answered your question about how to work around it when crossbuilding. It's a bit difficult to make suggestions since I don't know what constraints your original code is operating under — I don't know how drastically you're free to change it.

(Our main focus here in scala/bug should be on identifying and fixing bugs in Scala, but discussing workarounds is also valid.)

SethTisue avatar Jul 14 '22 15:07 SethTisue

Note that I'm aware that I haven't answered your question about how to work around it when crossbuilding. It's a bit difficult to make suggestions since I don't know what constraints your original code is operating under — I don't know how drastically you're free to change it.

doesn't matter. Thanks a lot ~

LuciferYang avatar Jul 15 '22 03:07 LuciferYang

Using Scala 2.12, the class type of arr is mutable.WrappedArray$ofRef, the toArray method did not trigger memory copy

Honestly, if that's the case, it seems like a bug in 2.12. the toColl methods have always produced new collections if called on a mutable collection—see Buffer#toBuffer for example. It's certainly not a bug in 2.13 for it to create a copy of the underlying Array.

NthPortal avatar Aug 23 '22 10:08 NthPortal

Using Scala 2.12, the class type of arr is mutable.WrappedArray$ofRef, the toArray method did not trigger memory copy

Honestly, if that's the case, it seems like a bug in 2.12. the toColl methods have always produced new collections if called on a mutable collection—see Buffer#toBuffer for example. It's certainly not a bug in 2.13 for it to create a copy of the underlying Array.

So we should fix Scala 2.12 instead of Scala 2.13?

LuciferYang avatar Aug 24 '22 06:08 LuciferYang

So we should fix Scala 2.12 instead of Scala 2.13?

I don't know. there is an understandable reluctance to change longstanding behaviour, especially if it can be argued that some users desire it. it probably depends if there's any documentation of the method that explicitly says "copy" or "new"

NthPortal avatar Aug 24 '22 07:08 NthPortal