dataflow-runner icon indicating copy to clipboard operation
dataflow-runner copied to clipboard

Consider supporting spot instances

Open BenFradet opened this issue 7 years ago • 2 comments

The situation has been getting better wrt Spark jobs running on spot instances in EMR recently (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html) so it might be interesting to support them.

BenFradet avatar Feb 22 '18 09:02 BenFradet

see snowplow/snowplow#3634

BenFradet avatar Feb 22 '18 10:02 BenFradet

I'd like to bump this request.

Recently we've encountered out of capacity issues from EMR, which requires us to configure the EMR clusters to be more flexible with instance provision. This required using instance fleets in EMR, which currently the dataflow runner cluster config schema doesn't support.

By supporting instance fleets, there are a couple of benefits:

  • Multiple subnets(corresponding to different availability zones) can be specified, and the EMR cluster will be provisioned in whichever availability zone that can provision the required cluster config. Currently only one subnet can be specified, meaning if an instance type is not available in that availability zone, the cluster provision will fail.
  • Multiple instance types can be supported. Currently only one instance type can be specified, but instance fleets support up to 30 different instance types, such as r5.xlarge, r5a.xlarge, r6g.xlarge etc. Giving more flexibility with provisioning.
  • Supporting a mixed on-demand and spot instance request, so that they can meet the target capacity.

Is it possible to support this?

VolatileBit avatar Feb 14 '23 07:02 VolatileBit