bug
bug copied to clipboard
IterableOnceOps.sum creates multiple iterators
Reproduction steps
Scala version: 2.13.6
val someSeq = Seq(1).toSeq
for (i <- 1 to 100_000) {
println(someSeq.view.filter(i => Random.nextBoolean()).sum)
}
The above is very likely to produce a stack trace like this:
java.lang.UnsupportedOperationException: empty.reduceLeft
at scala.collection.IterableOnceOps.reduceLeft(IterableOnce.scala:727)
at scala.collection.IterableOnceOps.reduceLeft$(IterableOnce.scala:724)
at scala.collection.AbstractIterable.reduceLeft(Iterable.scala:926)
at scala.collection.IterableOnceOps.reduce(IterableOnce.scala:698)
at scala.collection.IterableOnceOps.reduce$(IterableOnce.scala:698)
at scala.collection.AbstractIterable.reduce(Iterable.scala:926)
at scala.collection.IterableOnceOps.sum(IterableOnce.scala:889)
at scala.collection.IterableOnceOps.sum$(IterableOnce.scala:889)
at scala.collection.AbstractIterable.sum(Iterable.scala:926)
<snip>
Problem
IterableOnceOps.sum
creates two different iterators for the source collection. Those iterators can be out of sync if the data source is lazy. This can cause exceptions to be thrown from sum
.
In order to determine if the source collection is empty, sum
creates an iterator and calls hasNext
. If that iterator is non-empty, sum
delegates to reduce(num.plus)
, which creates a new, separate iterator to compute the sum. reduce
fails on empty iterators, and for lazy collections involving filters, there is no guarantee that two separate iterators will agree on the state of the source collection.
I think this can be fixed by making sum
create an iterator it can pass to reduce
, instead of having the is-empty check create an iterator that isn't used by the reduce operation.
I would expect it to use foldLeft(num.zero)(num.plus)
directly.
But a more interesting question is what to do with all the other methods which use the same pattern with isEmpty
?
Does this fall under the caveat for isEmpty
, broadly construed:
* Note: Implementations in subclasses that are not repeatedly traversable must take
* care not to consume any elements when `isEmpty` is called.
Naively, I would expect the underlying to know what has been evaluated.
@scala/collections ?
it's hard for me to decide whether or not this is even a bug. it's probably not hard to fix, but from the perspective of isEmpty
's implementation, you've changed the collection during isEmpty
's execution, and I'm not sure why you expected that to go any way but poorly
from the perspective of isEmpty's implementation, you've changed the collection during isEmpty's execution, and I'm not sure why you expected that to go any way but poorly
Okay, but you do see how this behavior makes View
somewhat dangerous to use? Also isEmpty
is in a class called IterableOnceOps
, which I take to mean the code should assume the underlying collection can be iterated only once, so creating two different iterators and calling hasNext
on both seems like a bad idea in any case. Some types of iterator may use hasNext
to load the next item from the underlying source that we may only be able to read once into the iterator, so calling hasNext
on two different new iterators isn't guaranteed to give the same result.
This is a really unfortunate feature interaction between view
not memoizing and IterableOnceOps
assuming that if it keeps asking for iterators it keeps getting the same iterator.
There's no justification for the library to have a footgun like this implemented in it. It's conceptually trivial to fix: for every operation, all IterableOnceOps
should do is get an iterator and defer to it. In practice, switching every use of isEmpty
to knownSize == 0
and then calling a method that does the empty-check again might be less churn on the code and might perform better in case of collection that knows when it's empty. Critically, if you need to get a new iterator, knownSize
returns -1.
For instance, sum
would be implemented as
def sum[B >: A](implicit num: Numeric[B]): B =
if (knownSize == 0) num.zero else iterator.sum
The extension method was always just iterator.sum
. This code was apparently an attempt at optimization that went awry. product
, most of the min
and max
code, and mkString
all seem to have the same unfortunate optimization.
The PR follows Ichoran. The hiccup is an existing optimization for IndexedSeq
for binary compat reasons.
In other archaeology, I'm sorry I missed the ticket where Ichoran said, "You can close this as not a bug, but the whole design is a bug." Because of math context, BigDecimal ops are not commutative. That also answers joroKr's question why isn't it foldLeft.
It just took me a second to wonder why I didn't break that case again: it still uses reduction and the z
value only on empty.
Also, breaking the case is good for detectives, bad for programmers.