ecr-buildkite-plugin icon indicating copy to clipboard operation
ecr-buildkite-plugin copied to clipboard

DO NOT MERGE: Minimal repro for "trap breaks retry" problem

Open lucaswilric opened this issue 5 months ago • 0 comments

  • Creates a new function busted_login, which always fails
  • Calls busted_login instead of login on execution
  • Creates a repro script, env-hook-repro.sh, which simulates the behaviour of the Elastic Stack's agent environment hook (the real thing is here)

Run the repro script to see how the trap command breaks the retry function defined in the environment hook.

We expect:

Expected execution log
$ ./env-hook-repro.sh
Retries: 3
Trying ECR login with max 4 attempts
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed on attempt 1 of 4. Trying again in 1 seconds...'
Login failed on attempt 1 of 4. Trying again in 1 seconds...
+ sleep 1
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed on attempt 2 of 4. Trying again in 2 seconds...'
Login failed on attempt 2 of 4. Trying again in 2 seconds...
+ sleep 2
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed on attempt 3 of 4. Trying again in 3 seconds...'
Login failed on attempt 3 of 4. Trying again in 3 seconds...
+ sleep 3
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed after 4 attempts'
Login failed after 4 attempts
+ return 127
^^^ +++
:alert: Elastic CI Stack environment hook failed

$ echo $?
53

We see:

Actual execution log
$ ./env-hook-repro.sh  
Retries: 3
Trying ECR login with max 4 attempts
++ ((  attempt_num <= max_attempts  ))
++ set +e
++ ./fail.sh
hooks/environment: line 71: ./fail.sh: No such file or directory
+++ handle_err
+++ echo '^^^ +++'
^^^ +++
+++ echo ':alert: Elastic CI Stack environment hook failed'
:alert: Elastic CI Stack environment hook failed
+++ exit 53
^^^ +++
:alert: Elastic CI Stack environment hook failed
$ echo $?
53

lucaswilric avatar Mar 04 '24 05:03 lucaswilric