scalding
scalding copied to clipboard
A Scala API for Cascading
I noticed that using the new `tally` API for counters won't create a new `diagnostic counter` (if I understand the naming correctly) if the associated TypedPipe has 0 elements. I'd...
``` [info] - the total number of steps is not more than cascading *** FAILED *** [info] TestFailedException was thrown during property evaluation. [info] Message: 5 was not less than...
One of our e2e tests fails when I try to use the the develop branch: ``` Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:79) at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:32) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)...
we have seen some issues with the scalding side-effect based counter API: see #1716 Scalding Operations use counters to log cache hit-rate effectiveness. We should just use the normal hadoop/cascading...
To test the modularity, a basic spark backend based on RDDs should be implemented. Should be easy to do to based on the cascading backend and the memory backend.
this error was observed running scalding 0.17.3. We do not yet know the cause. ``` at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:148) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:460) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)...
We really want O(1) steps per partition, since we want to make sure cascading can plan it fast, but still we don't understand why the law is failing. It could...
We can take a series of writes in the Typed API, or we can use a FlowDef to create an Execution which has been broken into small pieces so cascading...
If we merge #1666 and continue that with putting Grouped, CoGrouped, and HashJoinable in the AST, we could do a number of optimizations, fairly easily if we steal the summingbird...
``` - all optimization rules do not increase steps arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(Filter(IterablePipe(List(1406023175)),),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),WithDescriptionTypedPipe(TrappedPipe(WithDescriptionTypedPipe(ForceToDisk(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(FilterKeys(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(ForceToDisk(WithDescriptionTypedPipe(MergedTypedPipe(IterablePipe(List(1)),WithDescriptionTypedPipe(Fork(IterablePipe(List(1))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),com.twitter.scalding.source.FixedTypedText(frco8uwemnb3cyHuwd9Feqeqrsc6ceqhsEnNOUmhbnk1apqnjs2IhzxMcg),Single(com.twitter.scalding.TupleGetter$IntGetter$@20d3d9e3)),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))), [info] arg1 = com.twitter.scalding.typed.OptimizationRules$ComposeFlatMap$@6091c00e.orElse(com.twitter.scalding.typed.OptimizationRules$ComposeMap$@66a991c1).orElse(com.twitter.scalding.typed.OptimizationRules$ComposeFilter$@6c399402).orElse(com.twitter.scalding.typed.OptimizationRules$ComposeWithOnComplete$@3f0e4827).orElse(com.twitter.scalding.typed.OptimizationRules$ComposeMapFlatMap$@51717e78).orElse(com.twitter.scalding.typed.OptimizationRules$ComposeFilterFlatMap$@7b7cca0e).orElse(com.twitter.scalding.typed.OptimizationRules$EmptyIterableIsEmpty$@5bc5d8e7).orElse(com.twitter.scalding.typed.OptimizationRules$DescribeLater$@1647ecc8).orElse(com.twitter.scalding.typed.OptimizationRules$DiamondToFlatMap$@4c1fc2c6).orElse(com.twitter.scalding.typed.OptimizationRules$RemoveDuplicateForceFork$@51ae5df1).orElse(com.twitter.scalding.typed.OptimizationRules$IgnoreNoOpGroup$@24d9a5c3).orElse(com.twitter.scalding.typed.OptimizationRules$DeferMerge$@d513c3c).orElse(com.twitter.scalding.typed.OptimizationRules$FilterKeysEarly$@3feac6a2).orElse(com.twitter.scalding.typed.OptimizationRules$FilterLocally$@1c08457b).orElse(com.twitter.scalding.typed.OptimizationRules$EmptyIsOftenNoOp$@75960c84) ``` giving the cascading stack: ``` [info] Cause: java.lang.NullPointerException: [info] at java.util.Objects.requireNonNull(Objects.java:203) [info] at...