terraform-aws-github-runner
terraform-aws-github-runner copied to clipboard
Feature Request: Adding job runtimes to webhook logs
Discussed in https://github.com/philips-labs/terraform-aws-github-runner/discussions/3686
Originally posted by winwinashwin December 23, 2023
Motivation
I would like to monitor our CI runtimes at job-level across our organization. A simple way to go about this would be to craft a cloudwatch query to filter the job logs from the webhook lambda and graph the runtimes. The completed event logs have the started_at and completed_at datetime fields but cloudwatch query syntax does not support parsing human readable timestamps as of today (according to official docs).
Computing the runtime within the lambda and adding this field to the logs will allow monitoring CI runtimes in cloudwatch natively without any hassle.
If approved I would love to open a PR for this feature. I believe this should be straightforward addition right about here
@npalm Any thoughts on this?
In general the thought is to let the runner part only take care of running jobs. This to ensure all the capacity is used for scaling and ensuring runners become available. HOwever we also have the need for monitor the behavior. For that we have introduced the option to deliver the events to a secondary queue. Which can be used for analatics. This feature is marked as experimental.
The longer termm goal is
- to move to the event bridge, to allow routing of events for different purpose
- provide analatyics option as part / along this module as well
So if I need to monitor my CI infra you recommended I enable the experimental secondary queue, consume the events and log computed metrics separately?
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.
@winwinashwin that is the only option we offer today However I would hope we find some time to refactor and move to the event bridge.