[cmd/opampsupervisor]: Implement PackagesAvailable for upgrading agent
Description: Implements ReportPackageStatuses and taking PackagesAvailable for upgrading the agent.
The agent will only accept a top-level package with an empty name. This agent must be signed using cosign's keyless signing method (this is how the opentelemetry-collector-releases repository signs its releases).
The signature field must be populated with the resultant base64 encoded cert and signature, both being space separated (signature = b64_cert + " " + b64_signature).
This first implementation only allows online verification; In order to verify the certificate, it must reach out to the public rekor transparency log instance. It also fetches certificates from the internet. Some of these things are configurable through environment variables, but I figure we can parse out how offline signature verification works in a follow-up PR. This basic setup should allow for signature verification for agents that have access to the internet.
This PR also does not revert the collector if it is unhealthy. This will need to be done in a follow up PR. I think we should do it after #34907 is merged, as I imagine the logic will overlap here.
Link to tracking Issue: Closes #34734, Partially solves #33947
Testing:
- Unit testing, e2e testing
@tigrannajaryan I'd appreciate you taking a look at this, even if just to review conformance to the spec.
I think we need a design doc or some other form of documentation that explains all the involved parties (cosign, Collector releases, OpAMP Server, Supervisor) and how they dance together to make sure the downloaded binary is what it pretends to be and is safe to use.
Yeah, good call out. I will work on this and get something in our spec that explains everything here.
@andykellr or @dpaasman00 do you plan to take over this PR?
@tigrannajaryan Yes, I will be taking this PR over!
@evan-bradley @tigrannajaryan Are you able to take another look when you have a chance? I believe feedback you left for Brandon was addressed, so let me know if there's anything that was missed!
I am most concerned about the hash/signature verification parts. We want to make sure we don't have any vulnerabilities here and don't introduce any in the future.
I think it is important that we have good test coverage for the verification code. Would you be able to add some tests? I can't see coverage results for this PR in codecov. Ideally we want to have full branch coverage for UpdateContent and funcs it calls.
@dpaasman00 ping me when you want me to do another round.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Planning to get back and address review by the end of the next week, December 13.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
@tigrannajaryan This should be ready now for another look!
Please address the failing CI and fix conflicts and mark the PR as ready for review again, thanks!
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
I'm very interested in the feature implemented in this PR. Is this feature still planned for inclusion? If so, is there anything I can do to help move it towards being merged? (e.g., assisting with testing, documentation, or addressing any outstanding feedback?)
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: dpaasman00 / name: Dakota Paasman (1cd211cd927819ed9a2d7fb90adc9af1fd8f1a9d, 33489566604200a06d2dc55d3bf66425359909bb, 37cf953200d2a71cc087661e0acbbab56c44f336, 83f46248a16e6529b5593ee1c5f9082525251a5b, 8bfdc1d3a153b9c28e72aac3eff6e8ba43e71dad, add631a0a35913cc5ce787f645a1759abbfea238, f700e01f939de54301cc5083887330d46a98ca69, fda3ee7b6ec7d0216307b93c94352ae163af722e)
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.
Hi maintainers đź‘‹,
Our team is very interested in the feature set introduced in #35503 and would love to help move it toward merge.
How we can help
- Contribute unit / integration tests and documentation.
- Review edge‑cases for offline or restricted environments.
- Prototype an extensible signature‑verification mechanism.
- Take on any additional tasks you identify—just let us know what would be most valuable.
Our main use‑case
During agent upgrades we need to sign and run our own builds of opentelemetry‑collector and opampsupervisor using our production PKI.
Would you consider abstracting the signature‑verification logic behind a small interface (e.g. SignatureVerifier) so that:
- Sigstore remains the default backend.
- Users can plug in a custom backend - whether via a dedicated Supervisor builder, DI, or another approach—without forking Supervisor (and ideally with an offline‑verification option).
If this direction sounds reasonable, we can:
- draft a brief design‑doc / ADR,
- deliver a PoC PR,
- and help maintain the code after merge.
Thank you for your work on OpAMP Supervisor! We're looking forward to collaborating.
I will be working on getting this branch and pull request up to date
This PR was marked stale due to lack of activity. It will be closed in 14 days.
The e2e test will continue failing until this PR in OpAmp-Go is included.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
New issue in OpAMP Go, this time related to setting the capabilities needed for upgrades. Have an issue opened here: https://github.com/open-telemetry/opamp-go/issues/448
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Hi, I found a bug where the supervisor writes the temporary bootstrap port to effective.yaml instead of the main supervisor port, causing the agent to become unhealthy after a binary upgrade (as it is unable to connect to the supervisor). I've opened a PR with a fix.
Please fix conflicts and mark ready to review again.