cascalog
cascalog copied to clipboard
Data processing on Hadoop without the hassle.
`with-job-conf`'s bindings won't work w/ threads. Inside the checkpointing macro, for example. Fix inside checkpoint by using futures instead of threads.
e.g., if c/count is done twice in one query (which can happen especially when predicate macros are involved)
e.g., Bixo's fetch pipe. Need to figure out how to parameterize things so its clear how Cascalog connects in and connects out.
I am using JCascalog to create a query to read and process data from a hbase table. On execution, two mappers are getting created for all types of queries and...
Probably against the matrix of Clojure and Hadoop version combinations, given how easy this is with Travis. Here's some more information: http://about.travis-ci.org/docs/user/build-configuration/ I believe that Cascading's going to start testing...
The new serfn for Cascalog 2.0 works great when the same var definitions exist on the tasks as the machine submitting the job. This is the case for normal ETL...
e.g. ``` (> ?pivot)) ``` `:>>` into a var will capture the output into a nested tuple (just a seq of fields) Unclear how to handle nested serialization. Perhaps Cascading...
The logs get a bit muddled of late.
Cascalog's errors are pretty bad and confuse new users. A non-exhaustive list of bad spots includes: 1. Trying to output to a local tap under Hadoop 2. Trying to write...
My notes from another ticket: I wanted to ask what you thought of allowing larger lazy sequences to be used with union and combine. I've got some code that you...