mrjob
mrjob copied to clipboard
Spark harness should implement commands
For complete support of MRJob
, it would be helpful for the Spark harness to be able to be able to implement e.g. mapper_cmd()
, mapper_pre_filter()
.
This isn't actually that difficult to do in Spark (dump input to a temp file, run a process with the file as stdin and read its stdout), but it may not be that useful. We'd basically be re-implementing Hadoop Streaming inside Spark.