mrjob
mrjob copied to clipboard
redo README
The mrjob README is pretty dated; it tries to sell mrjob as "the Python Hadoop streaming library" and doesn't talk about Spark features at all.
We should highlight things like:
-
mrjob spark-submit
- archives supported across Spark installations
- setup scripts
- EMR cluster setup
- mix and match Spark and Hadoop Streaming
- Spark runner