dumbo icon indicating copy to clipboard operation
dumbo copied to clipboard

Integration Amazon EMR

Open igorgatis opened this issue 11 years ago • 0 comments

Sounds like all that's needed is a new backend to talks to s3 file system and EMR jobflow control (via boto API).

Essential features:

  • Read input from and write output to S3.
  • Create new jobflow or reuse existing one.
  • Options to specify number of instance and their types (e.g. m1.medium)

Nice to have:

  • Automatic upload of local input files to S3.
  • Change number of workers instances.
  • Support to spot instances
  • Resource estimator for future runs (e.g. try with a sample, figure how long it will take for the full thing).

igorgatis avatar Nov 05 '13 18:11 igorgatis