cascalog
cascalog copied to clipboard
Make job conf configurable on a subquery by subquery basis
Rather than on an executed query basis.
I suggest we add a :job-conf
option predicate to subqueries. This would take a map of properties. Note from Chris Wensel about a feature that's just been added to Cascading 2.0:
just added FlowStepStrategy to the next wip
allows you to change the current flow step configuration (by calling #getConf) on the flowStep instance.
/**
- The FlowStepStrategy interface allows for on the fly customization of {@link FlowStep} configuration values
- before they are submitted to the underlying platform.
- Use a strategy instance to change the display name for a job, or in the case of Hadoop, the number of
- mapper or reducer instances.
- Note, to change the configuration information, {@link cascading.flow.planner.FlowStep#getConf()} must be
- called to get access to the current configuration. Calling {@link FlowStep#setName(String)} would have no effect.
- If any, the completed predecessor steps are provided so that the predecessors can be inspected via the
- {@link cascading.stats.FlowStepStats} interface for any information that may influence the current job.
- It is also possible to block submission of the job by blocking in this method. */
With the release of 2.0 (congrats, btw!), is anyone working on this? I'd be happy to give it a go, seeing that necessity is the mother of all invention, and I need this quite badly!
no one is working on it actively that I'm aware of. A PR for this would be awesome.
Adding a conf
option predicate would be the way to go. This'd be great.
Sounds good. I'll get caught up with the changes in 2.0, then submit a quick overview for a sanity check before digging in.