scalding icon indicating copy to clipboard operation
scalding copied to clipboard

ArrayIndexOutOfBoundsException in Cascading

Open fwbrasil opened this issue 7 years ago • 7 comments

One of our e2e tests fails when I try to use the the develop branch:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
	at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:79)
	at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:32)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$getIterable$1$$anon$1.foreach(AsyncFlowDefRunner.scala:360)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$getIterable$1$$anon$1.map(AsyncFlowDefRunner.scala:360)
	at com.twitter.data_platform.e2e_testing.jobs.dal_keyval_source_summingbird.VerifyResultsExecutionApp$$anonfun$3.apply(VKVSTest.scala:104)
	at com.twitter.data_platform.e2e_testing.jobs.dal_keyval_source_summingbird.VerifyResultsExecutionApp$$anonfun$3.apply(VKVSTest.scala:102)
	at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)

Considering that Iterator.foreach checks if hasNext before calling next, it seems that TupleEntryChainIterator enters a bad state where currentIterator points to an invalid position.

I haven't been able to reproduce the cascading bug in isolation yet.

cc/ @johnynek

fwbrasil avatar Feb 09 '18 18:02 fwbrasil

I wonder if the source you are dealing with has a bug with toIterator? We assume we can call that again and again, but maybe this source has an issue there?

johnynek avatar Feb 09 '18 19:02 johnynek

It seems to be a bug in cascading. TupleEntryChainIterator should never throw if used correctly (hasNext and then next), which is the case.

fwbrasil avatar Feb 09 '18 19:02 fwbrasil

I wonder if it is exhibited in cascading 2.7?

johnynek avatar Feb 09 '18 19:02 johnynek

also: why did we not trigger it before, but now we do?

johnynek avatar Feb 09 '18 21:02 johnynek

I'd love to find a repro of this issue.

johnynek avatar Feb 20 '18 20:02 johnynek

I've investigated this issue a little more. The bug is not in TupleEntryChainIterator, but in the underlying iterator impl HadoopTupleEntrySchemeIterator. Its hasNext returns true initially but a second call to hasNext returns false, even before next is called.

fwbrasil avatar Apr 16 '18 20:04 fwbrasil

@fwbrasil is this a race condition in Hadoop? we have seen a few of what looks like those.

johnynek avatar Apr 16 '18 20:04 johnynek