dataflow-runner icon indicating copy to clipboard operation
dataflow-runner copied to clipboard

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR

Results 15 dataflow-runner issues
Sort by recently updated
recently updated
newest added

See https://github.com/snowplow/snowplow/issues/1263

The situation has been getting better wrt Spark jobs running on spot instances in EMR recently (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html) so it might be interesting to support them.

Instead of writing down id of created cluster - we can do following: 1. After `up` command create a `.cluster-id` file containing obviously cluster id in current directory 2. If...

This is a poor man's #17 - you use the EMR jobflow/cluster name to prevent multiple jobs from running at the same time. The jobflow will refuse to start if...

This would be based on name and age where rather than passing the jobflow ID you would pass the jobflow name and dataflow-runner would select the newest cluster available that...

I have observed (replicable issue) while running dataflow runner in `run-transient` with playbooks longer than 8 steps. The error one may observe is _400: Throughput exceeded_ issue. Once the error...

This command would check whether the currently running cluster spec is the same as what is specified in the config. If not it would trigger a resize. This would let...

The library we're using to handle avro records (https://github.com/elodina/go-avro) hasn't seen any activity for almost two years. Furthermore, https://github.com/linkedin/goavro seems to be more feature-complete.