terraform-aws-github-runner icon indicating copy to clipboard operation
terraform-aws-github-runner copied to clipboard

Feature Request: Adding job runtimes to webhook logs

Open winwinashwin opened this issue 1 year ago • 4 comments

Discussed in https://github.com/philips-labs/terraform-aws-github-runner/discussions/3686

Originally posted by winwinashwin December 23, 2023

Motivation

I would like to monitor our CI runtimes at job-level across our organization. A simple way to go about this would be to craft a cloudwatch query to filter the job logs from the webhook lambda and graph the runtimes. The completed event logs have the started_at and completed_at datetime fields but cloudwatch query syntax does not support parsing human readable timestamps as of today (according to official docs).

Computing the runtime within the lambda and adding this field to the logs will allow monitoring CI runtimes in cloudwatch natively without any hassle.


If approved I would love to open a PR for this feature. I believe this should be straightforward addition right about here

winwinashwin avatar Dec 23 '23 17:12 winwinashwin

@npalm Any thoughts on this?

winwinashwin avatar Dec 28 '23 09:12 winwinashwin

In general the thought is to let the runner part only take care of running jobs. This to ensure all the capacity is used for scaling and ensuring runners become available. HOwever we also have the need for monitor the behavior. For that we have introduced the option to deliver the events to a secondary queue. Which can be used for analatics. This feature is marked as experimental.

The longer termm goal is

  • to move to the event bridge, to allow routing of events for different purpose
  • provide analatyics option as part / along this module as well

npalm avatar Jan 12 '24 09:01 npalm

So if I need to monitor my CI infra you recommended I enable the experimental secondary queue, consume the events and log computed metrics separately?

winwinashwin avatar Jan 15 '24 06:01 winwinashwin

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 15 '24 01:02 github-actions[bot]

@winwinashwin that is the only option we offer today However I would hope we find some time to refactor and move to the event bridge.

npalm avatar Mar 06 '24 19:03 npalm