worker_disconnect_timeout ignored sometimes

Open godber opened this issue 7 years ago • 0 comments

In trying to reproduce #893 I scaled the k8s worker deployment to 0 which means there are no worker pods running. I can see the slices stop being processed in the execution controller logs and the worker pods go away. But if I wait the timeout period the execution controller does not go away. Not even after many minutes (my timeout was set to 120000, 120s).

I did see it exit on it's own once in like 8 tries. I can show you how to reproduce this.

I am not convinced we really want this timeout to behave this way in k8s anyway, though maybe we do.

Feb 12 '19 23:02 godber