turbinia icon indicating copy to clipboard operation
turbinia copied to clipboard

Implement turbiniamgmt workercheck command

Open aarontp opened this issue 7 years ago • 6 comments

Right now we don't have a way to check how many workers are connected, and this can be non-trivial because workers can connect from anywhere. We should implement a 'turbiniactl workercheck' (or similar) command to run a quick check on all the workers. PSQ has a Broadcast worker mechanism that we can use for this.

aarontp avatar Oct 21 '17 17:10 aarontp

We can adapt and use the existing worker_stat.py task code for this.

aarontp avatar Nov 08 '17 22:11 aarontp

Bump. When there are unexpected workers listening to the same pubsub channel, it's pretty annoying to debug. We should also add a 'worker kill' command so that we can do something about it remotely.

aarontp avatar Feb 09 '19 00:02 aarontp

I chatted with @alimez about this, and I think rather than having a worker check command be implemented as a task, we'll have the workers write heartbeat/timestamp information into datastore keyed by their hostnames. This will allow us to see what workers are running and where. We can also use this to do some basic monitoring like in https://github.com/google/turbinia/pull/509 .

Cc: @wajihyassine for the monitoring angle since we could potentially use this in a dashboard.

aarontp avatar May 13 '20 19:05 aarontp

We should still have a turbiniactl workercheck command or similar so that we can read out the heartbeat information easily. If we follow the same pattern as other status commands we should have a corresponding cloud function that actually does the work.

aarontp avatar May 13 '20 20:05 aarontp

@aarontp Is this something you'd still want to add to the new client? If so, pls assign to me.

jleaniz avatar Jan 10 '23 16:01 jleaniz

@jleaniz I do think that we should have some kind of functionality that gives us insight into the actual running workers. Now that we have just Celery, I think we can use functionality from there to do this instead of a separate heartbeat method that we maintain. We had also talked about out of band about using the Celery web ui, so we could potentially use that instead if that would be easier.

aarontp avatar Jan 10 '23 23:01 aarontp