aws-cli icon indicating copy to clipboard operation
aws-cli copied to clipboard

Expose parameters for 'delay' and 'max-attempts'

Open arvan-pritchard opened this issue 9 years ago • 31 comments

I'm trying to migrate to the unified cli, and to use waiters instead of polling loops, but the conversion-task-completed waiter consistently times out in us-east-1 for my 12GB disk.

It reports Waiter ConversionTaskCompleted failed: Max attempts exceeded but the task does complete successfully. I can write a loop which calls the waiter, but I'd prefer to increase the number of attempts, or reduce their frequency. They appear to be taken from botocore/data/aws/ec2/2014-09-01.waiters.json, which contains the following:

"ConversionTaskCompleted": { "delay": 15, "operation": "DescribeConversionTasks", "maxAttempts": 40,

I'd like these to be exposed as parameters in the aws cli command, or perhaps for the timeout to be derived from the size of the disk being converted.

arvan-pritchard avatar Apr 17 '15 10:04 arvan-pritchard

@arvan-pritchard

Being able to modify the delays and max attempts is something that we want to expose. We just need to put the work in to expose the parameters. As for your timeout, how many times does it take for your wait command to succeed (i.e. not timeout)? In the meantime, we can bump up the delay and max attempts for that particular waiter. We just need to know how far off the wait is currently.

kyleknap avatar Apr 17 '15 16:04 kyleknap

Sorry' I've not measured it - while I was trying to use a single invocation of the waiter it failed 3 times but was near or fully complete before I noticed and checked the conversion task status manually.

Our previous code using the old APIs allowed 1 hour for this step, and I have now coded a loop repeating the waiter for up to an hour and that works.

While developing the script I used the Frankfurt region and never saw the waiter time out, so I'd guess that it probably completes on the second wait invocation.

arvan-pritchard avatar Apr 20 '15 09:04 arvan-pritchard

No worries. We will look into increasing the delay time or max attempts and enabling the adjustment of these parameters as well.

kyleknap avatar Apr 20 '15 16:04 kyleknap

I'm currently encountering this timeout issue on the similar aws ec2 wait image-available operation. Is this still unresolved?

wjordan avatar Apr 20 '16 23:04 wjordan

@wjordan we get around it by retrying the wait on a loop

aehlke avatar Apr 21 '16 15:04 aehlke

As mentioned in one of the comments above, similar behavior is observed using image-available:

aws ec2 wait image-available --image-ids ${image-id} --filter "Name=state,Values=available"

This times out after 20 minutes or so.

0xmohit avatar Jul 14 '16 07:07 0xmohit

+1 on this. I've been having issues with aws ec2 wait volume-available and aws ec2 wait snapshot-completed timing out as well for large volumes (75GB+) making this feature completely useless.

sgnn7 avatar Feb 27 '17 18:02 sgnn7

We ultimately solved this by switching to CloudFormation, via troposphere :)

aehlke avatar Feb 28 '17 20:02 aehlke

I solved it by doing an explicit check for volume state and snapshot progress until it's solved upstream. Feel free to copy/paste from here.

sgnn7 avatar Feb 28 '17 20:02 sgnn7

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We’ve imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

ASayre avatar Feb 06 '18 10:02 ASayre

Based on community feedback, we have decided to return feature requests to GitHub issues.

jamesls avatar Apr 06 '18 21:04 jamesls

Did a fix come for this issue ? I still get : Waiter SnapshotCompleted failed: Max attempts exceeded when trying to wait for a snapshot creation to be completed. My snapshot is 70G so it does take a while to finish.

Ashbhaic avatar May 15 '18 17:05 Ashbhaic

Second on this. I have volumes up in the TB range. Am seeing:

Waiter SnapshotCompleted failed: Max attempts exceeded

when attempting to take a snapshot. Not sure how to proceed from here.

johnbbell avatar Jun 26 '18 21:06 johnbbell

Same here for RDS snapshots. ;(

bknowles avatar Sep 04 '18 19:09 bknowles

same here.

GautierAtWork avatar Dec 27 '18 15:12 GautierAtWork

Same here for CloudFront invalidation.

kkushimoto avatar Feb 01 '19 00:02 kkushimoto

We received this 'failure' last night during some maintenance work. 17:58:41 aws rds wait db-snapshot-completed --db-snapshot-identifier mynewsnapshot --db-instance-identifier mydbinstance 18:08:29 RDS: Waiter DBSnapshotCompleted failed: Max attempts exceeded

The DB we were doing the snapshot on was only about 10GB in size.

Any chance this 4 year old feature request will be implemented soon?

john-mclean avatar May 01 '19 16:05 john-mclean

Same here for ecs wait services-stable with the default deregistration delay.

bogdanbarna avatar Nov 29 '19 15:11 bogdanbarna

Hi there, is there any update on this issue? It would be great to have the number of attempts for aws ecs wait services-stable be configurable. Thanks

bilbof avatar Nov 17 '20 17:11 bilbof

I found this thread when I was trying to increase the timeout for an RDS snapshot copy. Ended up forking the script @sgnn7 provided to support RDS Snapshots... if anyone wants it, they can find it here

seano-vs avatar Dec 17 '20 16:12 seano-vs

I'm encountering same problem with aws ec2 wait image-available. Our pipeline errors out after 10 minutes if the image is not available. We created a bash script as a workaround, but using this cli command is much simpler and easier.

Are there any updates on this?

Mateusz-Janowski avatar Oct 18 '21 13:10 Mateusz-Janowski

@tim-finnigan as you’ve closed #2849 in favor of this feature request I’m commenting here.

#2849 was about the problem that the waiter times out when the resource comes to a state when the condition can no longer be met. I mean resource types have defined state transitions. E.g. an EC2 instance can go from…

  • launching -> running
  • launching -> terminated
  • running -> stopped
  • running -> terminated
  • stopped -> running
  • stopped -> terminated

However, once an instance is terminated it can no longer reach any other state. So, if I have a waiter that waits for a state that can not be reached I would prefer if the waiter fails as soon as it detects this state with an error indicating that the desired state can no longer be reached. It simply doesn’t make sense to wait for the timeout to fail when you know that a condition can not be met.

muhqu avatar Oct 30 '21 08:10 muhqu

Hi @muhqu, thanks for following up. I may have misunderstood your original post. I’m going to reopen #2849 and respond there so that we can keep this issue focused on the feature request to expose parameters for delay/max attempts.

tim-finnigan avatar Oct 30 '21 15:10 tim-finnigan

Restoring a dynamodb table from a snapshot (100k rows) takes around 30/60m, but the job is failing around 10 mins.

I've tried these env variables but doesn't seem to work??

  AWS_MAX_ATTEMPTS: 500
  AWS_POLL_DELAY_SECONDS: 30
aws dynamodb wait table-exists --table-name [table name]
Waiter TableExists failed: Max attempts exceeded

Lusitaniae avatar Nov 10 '21 18:11 Lusitaniae

@Lusitaniae for information on general retry configuration you can refer to this documentation: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html. Also this comment provides context on when it was added and which versions support it.

AWS_POLL_DELAY_SECONDS is not a documented AWS CLI environment variable, but for issues dealing with AWS_MAX_ATTEMPTS I think this is a better place to discuss: https://github.com/aws/aws-cli/issues/5653

tim-finnigan avatar Nov 11 '21 22:11 tim-finnigan

Having the same issues with aws ec2 wait image-available, the built in maximum wait time is 10 mins, but it takes ~40 - 60 mins to complete the import task of a blank windows image.


Workaround to copy for anyone using aws ec2 wait image-available. Copied and modified from somewhere in this thread (can't remember which link).

import_task_id=$(aws ec2 import-image ... --query 'ImportTaskId' --output text)

# Wait for image completion
while true; do
  import_task_status_command="aws ec2 describe-import-image-tasks --import-task-ids ${import_task_id} --query 'ImportImageTasks[0].Status' --output text"
  echo "Running command: ${import_task_status_command}"
  import_task_status=$(${import_task_status_command})
  echo "Import task [${import_task_id}] status is [${import_task_status}]."

  if [[ "$import_task_status" == "completed" ]]; then
    echo "Completed, exiting..."
    break
  elif [[ "$import_task_status" == "active" ]]; then
    echo "Waiting 1 minute..."
    sleep 60
  else
    echo "Error, exiting..."
    exit 1
  fi
done

tullydwyer avatar Feb 15 '22 02:02 tullydwyer

Instead of querying the task status you can query the image status directly too..

In my scripts I have replaced
aws ec2 wait image-available --image-ids $imageId with this .. wait_for_image_available $imageId

function wait_for_image_available(){
    local  _imageId=$1
    while true; do
        image_status=$(aws ec2 describe-images --image-ids ${_imageId} --query 'Images[0].State' --output text)
        echo "Image [${_imageId}] status is [${image_status}]."

        if [[ "$image_status" == "available" ]]; then
            echo "Image is available now. Continuing.. "
            break
        elif [[ "$image_status" == "pending" ]]; then
            echo "Waiting 30 seconds.."
            sleep 30
        else
            echo "Error, exiting.."
            exit 1
        fi
    done
}

charsi avatar Mar 17 '22 17:03 charsi

This is definitely a bug and not a feature request. I have a very basic, canonical use case:

  • run a model in EC2
  • upload the results to S3
  • shutdown + terminate EC2 instance Steps 2 and 3 have to happen synchronously, otherwise the instance will terminate and lose all the results.

openSourcerer9000 avatar Nov 15 '22 17:11 openSourcerer9000

Hi @openSourcerer9000 thanks for reaching out. Here are the existing waiters for EC2 and here are the waiters for S3 APIs. Is there a specific waiter you're having issues with or would you like to request a new waiter? Service teams own the creation of their waiter models that are used across AWS SDKs and the CLI, so if you were requesting that a new one be added then I can forward that request to the appropriate team.

tim-finnigan avatar Nov 21 '22 17:11 tim-finnigan

function wait_for_image_available(){ local _imageId=$1 while true; do image_status=$(aws ec2 describe-images --image-ids ${_imageId} --query 'Images[0].State' --output text) echo "Image [${_imageId}] status is [${image_status}]."

    if [[ "$image_status" == "available" ]]; then
        echo "Image is available now. Continuing.. "
        break
    elif [[ "$image_status" == "pending" ]]; then
        echo "Waiting 30 seconds.."
        sleep 30
    else
        echo "Error, exiting.."
        exit 
done

}

this coding has to be done within the file exploration.the file expose parameters are differ from the main file criteria's and given to the car of the file touching parameters.

expose parameters for delay and max attempts. #1295

amberkushwaha avatar Jun 12 '23 05:06 amberkushwaha