mrjob
mrjob copied to clipboard
Ability to specify protocols for a specific step via MRStep
The ability to more explicitly assign an input/output/internal protocol for a step when in the Job.steps()
method would be great. For example;
def steps(self):
return [MRStep(mapper=None, reducer=None, output_protocol=FooBar),
MRStep(mapper=None, reducer=None, output_protocol=Baz)]
The input_protocol
to the second step would be inferred by the fact the output_protocol
has been set on the previous step.
This especially makes sense if some of your steps are JarStep
s.
Updated to use MRStep
rather than self.mr()
, for consistency with #815.
Also pretty topical now that we're working on Spark.