newrelic-quickstarts [Repository] Improve observability of workflows

Summary

The workflows that run on this repository have become much more important for the continued functionality of the quickstarts ecosystem. To support that we need to be able to track how often our workflow succeed, be notified when they fail, and have the information available to debug issues when they arise.

What we want to know

When validation/submission workflows run
Whether they succeed or fail
In the case of failure
- Relevant information is reported to facilitate debugging

Ideas

Error and info level logging
- Forward logs to NR1?
- Logging library vs custom implementation
Reporting workflow failure when it fails before the validation/submission step
- Currently only reports failure if we get to that step and it fails
Workflow runs as Transactions?

Information to capture

Workflow and job ids
Time/date
The quickstart or install plan that was being validated/submitted
The associated PR
Number of quickstarts or install plans being modified.
For failure:
- The contents of the graphql query
- Associated error messages and codes

Possible Solutions

Acceptance Criteria

add extra logging to stdout and console log it.
remove the use of the APM agent for the repo, and use custom events & logs instead.
add an always running step at end of workflow to send status back to new relic.
validate this data is reporting into the NEW DevEn account
update dashboards that are keying off the APM events. Quickstart repo workflows

Mar 02 '22 22:03 aswanson-nr

@aswanson-nr 👋

Mar 29 '22 17:03 jpvajda

we discussed this and decided to break it down into smaller steps, a first pass would be:

add logging to stdout
add an always running step at end of workflow to send status back to new relic.
validate this data is reporting into the NEW DevEn account

Mar 30 '22 17:03 jpvajda

this is just a general thought after some light thinking. for extra output for the future that we might want to log:

anything that would let us write/improve tests. as an example, we have functions validating graphql responses, etc. if we log what those responses are, we can then use that data in unit tests to catch errors in the future.

example failure: https://github.com/newrelic/newrelic-quickstarts/runs/5186615982?check_suite_focus=true.

in that instance, the information we have in the output isn't helpful. it would be helpful to include the whole response body from graphql, as well as the whole request body so that we can easily reproduce the failure -- and maybe write some tests using that info.

Mar 30 '22 18:03 moonlight-komorebi

another thought:

for this workflow failure -- https://github.com/newrelic/newrelic-quickstarts/runs/5779399220?check_suite_focus=true -- we fail on the install plan step and dont get to the quickstart step.

if possible, it would be useful to know what install plans and quickstarts failed to submit / are blocked by this issue until its fixed -- if this were to succeed, what quickstarts and install plans would be updated. this would help us prioritize fixing the failure more appropriately.

Apr 06 '22 17:04 moonlight-komorebi

Something like this would help us triage and prioritize the failure:

[!] The following (1) install plans are impacted by this:
    * foobar
    
[!] The following (3) quickstarts are impacted by this:
    * aws/aws-ec2
    * aws/aws-dynamo-db
    * apache

We could determine the install plans and quickstarts impacted by this, but it would require a lot of manual work each time.

Apr 06 '22 17:04 zstix

Old issues will be closed after 105 days of inactivity. This issue has been quiet for 90 days and is being marked as stale. Reply here to keep this issue open.

Jul 06 '22 02:07 github-actions[bot]

This issue is being closed due to inactivity. Is this a mistake? Please re-open this issue or create a new one.

Dec 07 '22 02:12 github-actions[bot]

newrelic-quickstarts newrelic-quickstarts copied to clipboard

[Repository] Improve observability of workflows

Summary

What we want to know

Ideas

Information to capture

Possible Solutions

Acceptance Criteria

newrelic-quickstarts
newrelic-quickstarts copied to clipboard