aws-ethereum-miner icon indicating copy to clipboard operation
aws-ethereum-miner copied to clipboard

Instnace g4dn.xlarge not starting even after the CF template successfully created the stack

Open vinodvarma24 opened this issue 3 years ago • 13 comments

Screen Shot 2021-01-21 at 2 10 42 PM

Screen Shot 2021-01-21 at 2 11 43 PM

Screen Shot 2021-01-21 at 2 12 11 PM

You can see the output and dashboard url, the worker did not start.

I have recreated cloudformation with template multiple times, but no luck. Could you throw some light here.

Thanks in advance,

vinodvarma24 avatar Jan 21 '21 13:01 vinodvarma24

What do you get in the Auto scaling group "Events" tab? There may be some hints on why it fails to spin up the instances.

mludvig avatar Jan 22 '21 03:01 mludvig

Screen Shot 2021-01-23 at 12 17 02 PM

I checked the Autoscaling group events tab, there seems to an issue with the low no. of spot instances in the Ohio region, those instances are not starting up. What is the best way to avoid this.

Should I run the Cloud formation template in Virginia or Oregon? or reduce the capacity of auto-scaling from 0-5 to something else?

vinodvarma24 avatar Jan 23 '21 11:01 vinodvarma24

I have figured it out. My aws account didnot have the Spot limit required.

vinodvarma24 avatar Jan 23 '21 23:01 vinodvarma24

Facing a similar issue in Events tab of Auto Scaling Group I see:

"Launching a new EC2 instance. Status Reason: We currently do not have sufficient g4dn.xlarge capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get g4dn.xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1b, us-east-1c, us-east-1d, us-east-1f. Launching EC2 instance failed."

Should I wait for the service request to be processed?

getSwiftly avatar Nov 03 '21 08:11 getSwiftly

The g4dn.xlarge have availability issues across all the cheapest regions. However you can currently run them in the Los Angeles (LAX) local zone (us-west-2-lax-1) for the same spot price as in Oregon. All you have to do is:

  1. Opt-in to the LAX zone with:
    aws --region us-west-2 ec2 modify-availability-zone-group --group-name us-west-2-lax-1 --opt-in-status opted-in
    
  2. Create new default subnets in the VPC in the 2 LAX AZs:
    aws --region us-west-2 ec2 create-default-subnet --availability-zone us-west-2-lax-1a
    aws --region us-west-2 ec2 create-default-subnet --availability-zone us-west-2-lax-1b
    
  3. Delete and re-deploy the CFN stack (because it picks up the AZ list when it's getting created).

mludvig avatar Nov 03 '21 08:11 mludvig

Thanks it got created in 20-30mins after I recreated the template. How to check what spot price I am getting on that AWS instance?

getSwiftly avatar Nov 03 '21 08:11 getSwiftly

@mludvig - I tried this , but when the ASG is created I'm getting the same error as below screenshot. In my case 2 out of 10 instances are getting created which are in us-west-2-lax-1b, us-west-2-lax-1a zones.

Also the ASG Details shows the Availability zones - us-west-2a, us-west-2b, us-west-2-lax-1b, us-west-2-lax-1a, us-west-2c, us-west-2d which means us-west-2-lax-1b, us-west-2-lax-1a are added.

I wonder if this is due to the spot on limitation. I've tried this as suggested.

Screenshot 2021-11-04 at 9 40 33 PM Screenshot 2021-11-04 at 9 30 54 PM

wilfi avatar Nov 04 '21 16:11 wilfi

Hi @wilfi

The log message says:

Max spot instance count exceeded.

You'll have to raise a support request to increase the spot instance quota for your account. The quota is in vCPU units and each g4dn.xlarge has 4 vCPUs, so increasing it to 40 will give you enough capacity for 10 instances in the region.

See: Increasing resource quotas in the README file.

mludvig avatar Nov 04 '21 21:11 mludvig

How long does it take for AWS to increase quotas? Still haven't heard back after a few days

0xtruth avatar Apr 13 '22 15:04 0xtruth

how to launch the CF tempalte via aws cli ? o.o

d4op avatar May 26 '22 13:05 d4op

@d4op don't add unrelated comments to existing issues. Open a new ticket and I'll tell you how to do it with aws cli ;)

mludvig avatar May 27 '22 07:05 mludvig

@0xtruth GPU limits often take them a few days to process or to ask for more info. Unfortunately quite often they reject it when a good justification for the request wasn't provided.

mludvig avatar May 27 '22 07:05 mludvig

I have figured it out. My aws account didnot have the Spot limit required.

What was the solution for this? I am facing the same issue.

MoAdelAbdelrahman avatar Jun 05 '22 23:06 MoAdelAbdelrahman