cascalog
cascalog copied to clipboard
Data processing on Hadoop without the hassle.
I found a funky issue in Cascalog 2.0. When trying to use `:trap` with `c/ops` I get following exception: ``` java.lang.Exception: java.lang.RuntimeException: Unable to resolve symbol: compare-fn in this context,...
I have made some modification of cascalog to run on tez.
cascalog.api/union and cascalog.api/combine only work with queries. A common pattern in our code is to do (combine (select-fields tap1 FIELDS) (select-fields tap2 FIELDS)). This construct used to work in 1.x,...
My profiling showed that managing thread bindings for `*op-call*` and `*flow-process*` vars carries a lot of overhead. In my test these calls accounted for about 16% of flow execution, excluding...
Note this doesn't work for parallel aggregators because Cascading doesn't give access to keys in the AggregateBy.Functor
Use of anonymous mapfns inside threaded queries lead to stack overflow exceptions. Assuming this is occuring when attempting to serialize the flow. I'm using 3.0.0-SNAPSHOT ``` clojure (def CONFIG {:a...
The last tutorial http://nathanmarz.com/blog/news-feed-in-38-lines-of-code-using-cascalog.html was interesting but is getting old (clojure/hadoop and cascalog version). For future version of Cascalog, it would be great to have a default standalone archive (a...
Functions declared using `(prepfn)` are treated by cascalog as vanila clojure funcitons and so behave as regular map/filter functions. If you want to make a prepared mapcat/buffer/aggregator you have to...
A JCascalog CascalogFilter that we are using isn't getting any fields passed in as part of the filterCall argument. Here is the filter: ``` java private static class FilterNullContentTypes extends...
I've noticed I get an exception when I use something like ``` clojure ((hfs-delimited "hfds:///a.txt") ?x ?y) ``` as a generator, and a.txt contains some lines where the second token...