turbinia
turbinia copied to clipboard
Improve Task progress tracking
With our move to using just Celery, Celery gives us some more functionality around how we give Task updates while the Worker is executing.
References:
- https://docs.celeryq.dev/en/stable/userguide/calling.html#on-message
- https://docs.celeryq.dev/en/stable/userguide/tasks.html#custom-states
Using custom states/ the update_state
method, come up with a better way to track a progress of a running Task. This may have to be different for each Task/Job depending what they are checking for and how long they are running, but some ideas are:
- Can have the progress metric be based on output file size for larger tasks such as Plaso
- Can have a default progress metric for Tasks that should be quick in nature (checking existence of a file)
- Can take the external programs latest stdout and provide it in the status update.
- Can take the latest modified timestamp of the output file
Another related thing we have talked about is allowing the task to write something like a status.txt
file into the output file, and if that exists, use that as the current status. Plaso has implemented a short output that we can use for this.