pat icon indicating copy to clipboard operation
pat copied to clipboard

Leaking droplets

Open drnic opened this issue 10 years ago • 4 comments

I'm going to raise this issue here as I can't reproduce it deploying apps normally.

I am running the cf dea-ads command from https://github.com/cloudfoundry/tools-cf-plugin/ so I can see the number of droplets that the runners think they are hosting.

With a fresh CF it starts at 0 (obviously).

I run pat for 4 iterations /pat -workload=gcf:push -iterations=4 -concurrency=2 - there were 7 droplets. I delete the 4 apps and it returns down to 3 droplets.

It should be 0; not 3.

I ran pat again for 2 iterations /pat -workload=gcf:push -iterations=2 -concurrency=2, and after deleting the created apps I am now at 5 droplets.

image

Can anyone think of why pat might be causing this? Or how CF could be allowing droplets to be created in excess of the apps being pushed?

To be clear, I have 5 droplets and 0 apps:

image

/cc @jbayer

drnic avatar Aug 02 '14 00:08 drnic

A few hours later and the rogue droplets disappeared from the count. Not sure how long between this ticket creation and now.

drnic avatar Aug 02 '14 06:08 drnic

@drnic it's likely the droplet deletion job only kicking in asynchronously via clock every so often [1]. i don't know where the config is that says how often the jobs are run, but i assume it's only every once in awhile. /cc @ematpl @dieucao @MarkKropf

[1] https://github.com/cloudfoundry/cloud_controller_ng/blob/master/app/jobs/runtime/droplet_deletion.rb

jbayer avatar Aug 02 '14 11:08 jbayer

Hi, @drnic,

As @jbayer pointed out, the app droplet blobs are deleted asynchronously via Delayed::Job, but they're not triggered by the CC's clock mode. When an app is deleted, the deletion cascades to its droplets, each of which then enqueues a DropletDeletion job on the 'cc-generic' queue. The generic-queue workers then work those off, but they can handle only 1 at a time per worker, so it could take some time for all the droplet blobs to be deleted.

On the other hand, that cf plugin does seem to be analyzing the state of the DEAs via their advertisements, not the CC and its blobstore. Can you get any more information about which instances the DEA thinks it has? Its varz endpoint exposes more detailed per-app data about instances in the instance_registry value, so that might be the easiest thing to query first.

Thanks, Eric

emalm avatar Aug 04 '14 05:08 emalm

Thanks for the info. I'll try to learn more about the wayward droplets.

On Sun, Aug 3, 2014 at 10:53 PM, Eric Malm [email protected] wrote:

Hi, @drnic, As @jbayer pointed out, the app droplet blobs are deleted asynchronously via Delayed::Job, but they're not triggered by the CC's clock mode. When an app is deleted, the deletion cascades to its droplets, each of which then enqueues a DropletDeletion job on the 'cc-generic' queue. The generic-queue workers then work those off, but they can handle only 1 at a time per worker, so it could take some time for all the droplet blobs to be deleted. On the other hand, that cf plugin does seem to be analyzing the state of the DEAs via their advertisements, not the CC and its blobstore. Can you get any more information about which instances the DEA thinks it has? Its varz endpoint exposes more detailed per-app data about instances in the instance_registry value, so that might be the easiest thing to query first. Thanks,

Eric

Reply to this email directly or view it on GitHub: https://github.com/cloudfoundry-incubator/pat/issues/110#issuecomment-51019885

drnic avatar Aug 04 '14 17:08 drnic