cloud_controller_ng Repeated rollback deployments can create excess processes

Repeated rollback deployments can create excess processes

Open heyjcollins opened this issue 4 years ago • 1 comments

While trying to get a large number of revisions on an app, I repeatedly rolled back and forth between 2 revisions. Eventually my deployments started to fail. When @cwlbraa and I investigated further, we found:

There were 100 Deployments associated with the app
There were ~3100 "web" processes associated with the app
The deployment updater logs were not failing in any obvious way
There were 100 Revisions associated with the app

We suspect PruneExcessAppRevisions eventually deleted revisions 1 and 2 (each app can have 100 revisions at most), but that doesn't explain how we got thousands of web processes.

Context

I was using "dora" as the app in a single-instance configuration. Then executed rollbacks in rapid succession Rollbacks, pushes, app summary requests failed

capi slack thread

Steps to Reproduce

push dora 3x
run a script to rollback repeatedly (zsh I used script below)

function ten-thousand-revisions-dora(){
  i=0
  while [ $i -lt 10000 ]
  do
    if [[ $i%2 -lt 1 ]]; then
      cf rollback dora --revision 2 -f
    else
      cf rollback dora --revision 1 -f
    fi
    i=$(($i + 1))
  done
}

after running the script again (slightly modified from above since revisions 1 and 2 had been pruned, we see the following error states:

This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3175 for app dora in org o / space s as admin...

OK

This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3176 for app dora in org o / space s as admin...

OK

This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3175 for app dora in org o / space s as admin...

memory quota_exceeded
FAILED
This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3176 for app dora in org o / space s as admin...

Unable to rollback. The code and configuration you are rolling back to is the same as the deployed revision.
FAILED
This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3175 for app dora in org o / space s as admin...

memory quota_exceeded
FAILED

Expected result

Either:

My deployments fail "gracefully" when the cluster runs out of resources
I should hit a limit on deployments/app

And:

There should never be more Processes than Deployments on the app.

Current result

~3000 revisions were created before the script started failing on the front end subsequent attempts to push different apps failed with "Insufficient Resources: insufficient resources" errors CLI cf apps and cf app dora took hours to return results.

Sep 23 '20 00:09 heyjcollins

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/174940465

The labels on this github issue will be updated when the story is started.

Sep 23 '20 00:09 cf-gitbot

cloud_controller_ng cloud_controller_ng copied to clipboard

Repeated rollback deployments can create excess processes

Context

Steps to Reproduce

Expected result

Current result

cloud_controller_ng
cloud_controller_ng copied to clipboard