turbinia
turbinia copied to clipboard
Implement turbiniamgmt workercheck command
Right now we don't have a way to check how many workers are connected, and this can be non-trivial because workers can connect from anywhere. We should implement a 'turbiniactl workercheck' (or similar) command to run a quick check on all the workers. PSQ has a Broadcast worker mechanism that we can use for this.
We can adapt and use the existing worker_stat.py task code for this.
Bump. When there are unexpected workers listening to the same pubsub channel, it's pretty annoying to debug. We should also add a 'worker kill' command so that we can do something about it remotely.
I chatted with @alimez about this, and I think rather than having a worker check command be implemented as a task, we'll have the workers write heartbeat/timestamp information into datastore keyed by their hostnames. This will allow us to see what workers are running and where. We can also use this to do some basic monitoring like in https://github.com/google/turbinia/pull/509 .
Cc: @wajihyassine for the monitoring angle since we could potentially use this in a dashboard.
We should still have a turbiniactl workercheck
command or similar so that we can read out the heartbeat information easily. If we follow the same pattern as other status commands we should have a corresponding cloud function that actually does the work.
@aarontp Is this something you'd still want to add to the new client? If so, pls assign to me.
@jleaniz I do think that we should have some kind of functionality that gives us insight into the actual running workers. Now that we have just Celery, I think we can use functionality from there to do this instead of a separate heartbeat method that we maintain. We had also talked about out of band about using the Celery web ui, so we could potentially use that instead if that would be easier.