distributed
distributed copied to clipboard
Adjust threshold for bad worker on worker table based on worker-ttl
We're currently marking workers as red on the worker table page if they don't heartbeat within 60s. The 60s is quite arbitrary and particularly if users expect heartbeats not to go through (e.g. due to user code holding the gil) this can be a bad UX.
The 60s are currently hard coded here
https://github.com/fjetter/distributed/blob/28804b72acc210935c061a0d68d46d4a6ae50a94/distributed/http/templates/worker-table.html#L19
and it is not obvious to users what the threshold is and how to control this behavior.
Dask considers these workers dead once the last seen goes above distributed.scheduler.worker-ttl (default 5min) and I feel like this paramter should influence the coloring of the row
Maybe 25% yellow/orange, 50% red or something similar instead of hard coding to 60s
@fjetter Will take a stab at this!
@jaabberwocky Of course, let me know if there are any questions!