dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

chore(.github/workflows): cache datadog-agent image (POC for general Docker images caching)

Open darccio opened this issue 4 months ago • 3 comments

What does this PR do?

Caches Docker images, starting with datadog-agent.

It also improves the DRYness of our workflows by using CUE.

Motivation

We are running 12 jobs, one per contribs' set and each supported Go version, that each one pulls up to 18 images over and over, thus causing:

  • Slow CI time: each services initialization averages around 2 minutes and 30 seconds.
  • Rate limiting: we are hitting Docker Hub, only for the pull request tests of a single PR, 216 times. Any increase on the number of PRs running CI will cause rate limiting, failing our pipelines.

Reviewer's Checklist

  • [ ] Changed code has unit tests for its functionality at or near 100% coverage.
  • [ ] System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • [ ] There is a benchmark for any new code, or changes to existing code.
  • [ ] If this interacts with the agent in a new way, a system test has been added.
  • [ ] New code is free of linting errors. You can check this by running ./scripts/lint.sh locally.
  • [ ] Add an appropriate team label so this PR gets put in the right place for the release notes.
  • [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

darccio avatar Aug 22 '25 11:08 darccio

⚠️ Tests

⚠️ Warnings

🧪 1 Test failed

TestTracesAgentIntegration from github.com/DataDog/dd-trace-go/v2/ddtrace/tracer (Datadog)
Failed

=== RUN   TestTracesAgentIntegration
    transport_test.go:92: 
        	Error Trace:	/home/runner/work/dd-trace-go/dd-trace-go/ddtrace/tracer/transport_test.go:92
        	Error:      	Received unexpected error:
        	            	Post "http://localhost:8126/v0.4/traces": dial tcp [::1]:8126: connect: connection refused
        	Test:       	TestTracesAgentIntegration
--- FAIL: TestTracesAgentIntegration (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
...

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 93f203b | Docs | Was this helpful? Give us feedback!

Benchmarks

Benchmark execution time: 2025-08-26 11:43:00

Comparing candidate commit 93f203b0ccfafcc371ddb10e73766dc0fe14517b in PR branch dario.castane/ktlo/download-agent-once-run-multiple-times with baseline commit 0441ec41104901fcf192d1c13ca9df5c1636721f in branch dario.castane/ktlo/disable-main-branch-ci.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 24 metrics, 0 unstable metrics.

pr-commenter[bot] avatar Aug 22 '25 15:08 pr-commenter[bot]

It is definitely better for the services workflow. But for the pull-request and unit-integration workflows, I'm not sure.

I agree, but I didn't want to take on a full refactor yet. My focus was on services and avoiding duplicating versions around.

I definitely see benefits on using CUE, although it's a bit complex. See what I had to do to achieve the conversion from #Service to #Image to reuse the service definition. Once it's set up, it just works.

darccio avatar Aug 26 '25 10:08 darccio