AutoSpotting icon indicating copy to clipboard operation
AutoSpotting copied to clipboard

Use max of instance type per ASG

Open thorro opened this issue 6 years ago • 11 comments

Issue type

  • Feature Idea

At the moment, cheapest spot instance is used, which is fine. Issue arises when AWS reclaims one instance type, then all of those instances can go away at the same time.

We are willing to sacrifice some cost savings by diversifying instances more.

A new setting could be defined as:

  • max_instances_of_same_type = number or percentage

So AutoSpotting would only launch instances up to that number. After that, it would look for second cheapest option and so on.

thorro avatar Mar 25 '19 08:03 thorro

@thorro Thanks for reporting this issue.

We used to have such a feature hardcoded into the logic, automatically switching the instance type if more than 20% of the instances were in the same AZ/type combination and would be outbid and then terminated if the spot price increased.

About a year ago we removed that because AWS is no longer terminating all the instances at once, but randomly claiming instances from a given instance type over time, regardless of the spot price and bid, which hopefully allows us to launch another instance with a different instance type.

Ever since this was changed I haven't seen anyone complain that all their instances are gone.

Have you actually seen this happen in practice?

I would be open to re-add this, and make it configurable as you suggested, as long as

  1. there are enough people who report this issue
  2. someone would contribute a PR for implementing it

Alternatively I can also implement it if at least a couple of Patrons are asking for it.

cristim avatar Mar 25 '19 09:03 cristim

Hi @cristim

I don't have that much experience with this, as we don't run that many spot instances yet.

About five days ago AWS terminated all 3 instances of the same type in one ASG. I think they were all t3.2xlarge.

But I've noticed just on friday AWS terminated one spot instance of i3.4xlarge type, other four kept running. So this looks more like the case you describe.

I think it may depend on the instance count they need. If they need a lot of them, all or almost all could go down. If not they take one here and there, not to upset a single customer too much.

Could you post the removed hardcoded code or a link to a commit as it would be a good starting point for our own mods. Thanks.

thorro avatar Mar 25 '19 09:03 thorro

Considering how the spot market works we can't exclude such scenarios, especially for popular instance types where there may be a lot of churn. Did your group lose all the capacity before any new instances were started?

It's definitely better to be prepared for this if possible, and as I said we can have this brought back if enough people complain about it.

As for the code, have a look around here:

https://github.com/AutoSpotting/AutoSpotting/blob/20fced19162c4ee1de87852fc7297e1bcf6c8353/core/instance.go#L147-L160

cristim avatar Mar 25 '19 10:03 cristim

That ASG lost all capacity, luckily for us it was not a production workload.

Hope some more people chime in. Thanks for the code pointer, will brush up on my Go skills. :)

thorro avatar Mar 25 '19 10:03 thorro

@ChienHuey just volunteered on Github to implement this as part of a hackathon. A few things I mentioned that may make it a bit more challenging

I'd love for it to be configurable similarly with how AWS does it for the mixed spot ASGs maybe that configurability work may need some more time than a full day of work basically to be able to toggle it on/off using stack parameters and override using tags like we have for other config options, but also to control the level of instance type spread per type/AZ combination maybe defaulting to 2 when enabled but the value 2 to be configurable to more if wanted so also via stack params and overrideable by tags

cristim avatar Apr 12 '19 14:04 cristim

@ChienHuey do you have any progress on this work?

cristim avatar Sep 18 '19 16:09 cristim

@thorro is this issue still of interest to you?

@ChienHuey let me know if you're still interested to work on this.

cristim avatar Mar 06 '23 16:03 cristim

@cristim no, we don't use Autospotting at the moment.

thorro avatar Mar 06 '23 16:03 thorro

Thanks @thorro!

I'd love to learn your reasons why and what you're using instead, as well as any other feedback about AutoSpotting you may have.

BTW, Last week I released this open source Spot savings estimator tool https://github.com/LeanerCloud/savings-estimator/


I hope you find it useful and I'd also love to hear some honest feedback about it.

cristim avatar Mar 06 '23 16:03 cristim

We've moved to EKS Managed Node Groups, don't know if Autospotting would work with that at all?

Will check out the tool, thanks!

thorro avatar Mar 07 '23 08:03 thorro

Yes, it should work but not if you configured them to use Spot. I'm intentionally skipping those in order to not interfere or cause race conditions

cristim avatar Mar 07 '23 15:03 cristim