scalding icon indicating copy to clipboard operation
scalding copied to clipboard

A Scala API for Cascading

Results 102 scalding issues
Sort by recently updated
recently updated
newest added

Ideally anything that does not increase the dependencies should be moved to base. By removing a function it looks like no one was using (and is just a trivial redirection...

Problem: When debugging Beam jobs running on Dataflow, it is hard to map a failure in the Dataflow UI back to the corresponding line in the Scalding code. Solution: Assign...

``` [warn] /Users/oscar/code/oss/scalding/scalding-beam/src/main/scala/com/twitter/scalding/beam_backend/BeamBackend.scala:29:27: match may not be exhaustive. [warn] It would fail on the following inputs: (CounterPipe(_), _), (CrossPipe(_, _), _), (CrossValue(_, _), _), (DebugPipe(_), _), (EmptyTypedPipe, _), (ForceToDisk(_), _),...

This PR is the second part of https://github.com/twitter/scalding/pull/1754. It adds the `Quoted` implicit param to the user-facing APIs. After this change, the last PR will use the projection information to...

The goal here is to extract the Typed API of scalding as a generic dataflow AST. With this, we have a clean API to make scalding backends without depending on...

Current implementation of KryoCoder writes class for every object on the output stream. (https://github.com/twitter/scalding/blob/b0ba993ac817e6b1e52126e8b1cfb1054cc00dad/scalding-beam/src/main/scala/com/twitter/scalding/beam_backend/KryoCoder.scala#L16) This was done because beam can split the stream in between and if registration is only...

PriorityQueue monoid mutates input collections and is not usable with BeamRunner. To mitigate that we disable map side aggregation for beam. We should try using cats PairingHeap implementation (https://github.com/typelevel/cats-collections/blob/master/core/src/main/scala/cats/collections/PairingHeap.scala)

We're considering to enable `OrderedSerialization` for most users at Twitter. Currently we have a blocker for that - users needs to change a source code to enable it (and not...

Group randomly uses nextInt(numReducers), this means we get huge reducer skew

Making the repl a little more useful out of the box.