opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[cmd/opampsupervisor]: Implement PackagesAvailable for upgrading agent

Open BinaryFissionGames opened this issue 1 year ago • 31 comments

Description: Implements ReportPackageStatuses and taking PackagesAvailable for upgrading the agent.

The agent will only accept a top-level package with an empty name. This agent must be signed using cosign's keyless signing method (this is how the opentelemetry-collector-releases repository signs its releases).

The signature field must be populated with the resultant base64 encoded cert and signature, both being space separated (signature = b64_cert + " " + b64_signature).

This first implementation only allows online verification; In order to verify the certificate, it must reach out to the public rekor transparency log instance. It also fetches certificates from the internet. Some of these things are configurable through environment variables, but I figure we can parse out how offline signature verification works in a follow-up PR. This basic setup should allow for signature verification for agents that have access to the internet.

This PR also does not revert the collector if it is unhealthy. This will need to be done in a follow up PR. I think we should do it after #34907 is merged, as I imagine the logic will overlap here.

Link to tracking Issue: Closes #34734, Partially solves #33947

Testing:

  • Unit testing, e2e testing

BinaryFissionGames avatar Oct 01 '24 06:10 BinaryFissionGames

@tigrannajaryan I'd appreciate you taking a look at this, even if just to review conformance to the spec.

evan-bradley avatar Oct 04 '24 16:10 evan-bradley

I think we need a design doc or some other form of documentation that explains all the involved parties (cosign, Collector releases, OpAMP Server, Supervisor) and how they dance together to make sure the downloaded binary is what it pretends to be and is safe to use.

Yeah, good call out. I will work on this and get something in our spec that explains everything here.

BinaryFissionGames avatar Oct 16 '24 15:10 BinaryFissionGames

@andykellr or @dpaasman00 do you plan to take over this PR?

tigrannajaryan avatar Oct 31 '24 18:10 tigrannajaryan

@tigrannajaryan Yes, I will be taking this PR over!

dpaasman00 avatar Nov 04 '24 12:11 dpaasman00

@evan-bradley @tigrannajaryan Are you able to take another look when you have a chance? I believe feedback you left for Brandon was addressed, so let me know if there's anything that was missed!

dpaasman00 avatar Nov 08 '24 15:11 dpaasman00

I am most concerned about the hash/signature verification parts. We want to make sure we don't have any vulnerabilities here and don't introduce any in the future.

I think it is important that we have good test coverage for the verification code. Would you be able to add some tests? I can't see coverage results for this PR in codecov. Ideally we want to have full branch coverage for UpdateContent and funcs it calls.

tigrannajaryan avatar Nov 13 '24 17:11 tigrannajaryan

@dpaasman00 ping me when you want me to do another round.

tigrannajaryan avatar Nov 18 '24 21:11 tigrannajaryan

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Dec 03 '24 05:12 github-actions[bot]

Planning to get back and address review by the end of the next week, December 13.

dpaasman00 avatar Dec 03 '24 14:12 dpaasman00

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Dec 18 '24 05:12 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jan 04 '25 05:01 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Jan 19 '25 05:01 github-actions[bot]

@tigrannajaryan This should be ready now for another look!

dpaasman00 avatar Jan 30 '25 13:01 dpaasman00

Please address the failing CI and fix conflicts and mark the PR as ready for review again, thanks!

atoulme avatar Mar 17 '25 23:03 atoulme

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Apr 01 '25 05:04 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar May 15 '25 05:05 github-actions[bot]

I'm very interested in the feature implemented in this PR. Is this feature still planned for inclusion? If so, is there anything I can do to help move it towards being merged? (e.g., assisting with testing, documentation, or addressing any outstanding feedback?)

shivam-abstract avatar May 19 '25 08:05 shivam-abstract

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: dpaasman00 / name: Dakota Paasman (1cd211cd927819ed9a2d7fb90adc9af1fd8f1a9d, 33489566604200a06d2dc55d3bf66425359909bb, 37cf953200d2a71cc087661e0acbbab56c44f336, 83f46248a16e6529b5593ee1c5f9082525251a5b, 8bfdc1d3a153b9c28e72aac3eff6e8ba43e71dad, add631a0a35913cc5ce787f645a1759abbfea238, f700e01f939de54301cc5083887330d46a98ca69, fda3ee7b6ec7d0216307b93c94352ae163af722e)

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jun 19 '25 05:06 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Jul 03 '25 05:07 github-actions[bot]

Hi maintainers đź‘‹,

Our team is very interested in the feature set introduced in #35503 and would love to help move it toward merge.

How we can help

  • Contribute unit / integration tests and documentation.
  • Review edge‑cases for offline or restricted environments.
  • Prototype an extensible signature‑verification mechanism.
  • Take on any additional tasks you identify—just let us know what would be most valuable.

Our main use‑case
During agent upgrades we need to sign and run our own builds of opentelemetry‑collector and opampsupervisor using our production PKI.

Would you consider abstracting the signature‑verification logic behind a small interface (e.g. SignatureVerifier) so that:

  1. Sigstore remains the default backend.
  2. Users can plug in a custom backend - whether via a dedicated Supervisor builder, DI, or another approach—without forking Supervisor (and ideally with an offline‑verification option).

If this direction sounds reasonable, we can:

  • draft a brief design‑doc / ADR,
  • deliver a PoC PR,
  • and help maintain the code after merge.

Thank you for your work on OpAMP Supervisor! We're looking forward to collaborating.

OrangeFlag avatar Jul 18 '25 23:07 OrangeFlag

I will be working on getting this branch and pull request up to date

dpaasman00 avatar Jul 23 '25 18:07 dpaasman00

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Aug 07 '25 05:08 github-actions[bot]

The e2e test will continue failing until this PR in OpAmp-Go is included.

dpaasman00 avatar Aug 12 '25 19:08 dpaasman00

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Sep 03 '25 05:09 github-actions[bot]

New issue in OpAMP Go, this time related to setting the capabilities needed for upgrades. Have an issue opened here: https://github.com/open-telemetry/opamp-go/issues/448

dpaasman00 avatar Sep 18 '25 17:09 dpaasman00

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Oct 13 '25 05:10 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Nov 11 '25 05:11 github-actions[bot]

Hi, I found a bug where the supervisor writes the temporary bootstrap port to effective.yaml instead of the main supervisor port, causing the agent to become unhealthy after a binary upgrade (as it is unable to connect to the supervisor). I've opened a PR with a fix.

hdilip-pub avatar Nov 17 '25 23:11 hdilip-pub

Please fix conflicts and mark ready to review again.

atoulme avatar Nov 20 '25 05:11 atoulme