mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Spark harness should implement commands

Open coyotemarin opened this issue 6 years ago • 0 comments

For complete support of MRJob, it would be helpful for the Spark harness to be able to be able to implement e.g. mapper_cmd(), mapper_pre_filter().

This isn't actually that difficult to do in Spark (dump input to a temp file, run a process with the file as stdin and read its stdout), but it may not be that useful. We'd basically be re-implementing Hadoop Streaming inside Spark.

coyotemarin avatar Jan 12 '19 05:01 coyotemarin