scalding
scalding copied to clipboard
ArrayIndexOutOfBoundsException in Cascading
One of our e2e tests fails when I try to use the the develop branch:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:79)
at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:32)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$getIterable$1$$anon$1.foreach(AsyncFlowDefRunner.scala:360)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$getIterable$1$$anon$1.map(AsyncFlowDefRunner.scala:360)
at com.twitter.data_platform.e2e_testing.jobs.dal_keyval_source_summingbird.VerifyResultsExecutionApp$$anonfun$3.apply(VKVSTest.scala:104)
at com.twitter.data_platform.e2e_testing.jobs.dal_keyval_source_summingbird.VerifyResultsExecutionApp$$anonfun$3.apply(VKVSTest.scala:102)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
Considering that Iterator.foreach
checks if hasNext
before calling next
, it seems that
TupleEntryChainIterator
enters a bad state where currentIterator
points to an invalid position.
I haven't been able to reproduce the cascading bug in isolation yet.
cc/ @johnynek
I wonder if the source you are dealing with has a bug with toIterator
? We assume we can call that again and again, but maybe this source has an issue there?
It seems to be a bug in cascading. TupleEntryChainIterator
should never throw if used correctly (hasNext
and then next
), which is the case.
I wonder if it is exhibited in cascading 2.7?
also: why did we not trigger it before, but now we do?
I'd love to find a repro of this issue.
I've investigated this issue a little more. The bug is not in TupleEntryChainIterator
, but in the underlying iterator impl HadoopTupleEntrySchemeIterator
. Its hasNext
returns true
initially but a second call to hasNext
returns false, even before next
is called.
@fwbrasil is this a race condition in Hadoop? we have seen a few of what looks like those.