community
community copied to clipboard
Proposal: Providing a Consistent CI/CD Experience
Introduction
Across all OpenTelemetry repositories there is currently 5 different, active CI providers. Each of these providers have their own way of executing tests, interacting with the user, and publishing test results. This can make it difficult for newcomers to contribute to multiple OpenTelemetry repositories.
Current Landscape
Repository | CI Provider | Automated Build and Test | Code Coverage | Automated Performance Testing | Automated Deployment | Automated Docs Deployment |
---|---|---|---|---|---|---|
Collector | CircleCI | [x] | [x] | [x] | [x] | [] |
C++ | GHA | [x] | [x] | [x] | [] | [] |
JavaScript | CircleCI/GHA | [x] | [x] | [] | [] | [x] |
.NET | Azure | [x] | [x] | [] | [] | [] |
PHP | Travis | [x] | [x] | [] | [] | [] |
Java | CircleCI | [x] | [x] | [x] | [x] | [x] |
Python | Travis/CircleCI | [x] | [x] | [] | [x] | [x] |
Ruby | CircleCI | [x] | [] | [] | [x] | [] |
Go | CircleCI | [x] | [x] | [] | [] | [] |
Swift | GHA/Scope | [x] | [] | [] | [] | [] |
Rust | CircleCI | [x] | [x] | [] | [] | [x] |
Erlang | CircleCI | [x] | [x] | [x] | [] | [] |
Proposal
I propose that all languages consider using the same CI provider. This would create a more consistent development process and make it easier for developers to contribute to multiple language libraries.
We suggest that provider be GitHub Actions. Here’s why:
Ease-of-Use
CircleCI and Travis will automatically run when pull requests and commits are issued against the repository. But if a contributor forks the repository, unless they set up an account with the CI provider and link it to their forked repository, CI will not be activated and tests will not be run automatically.
In contrast, GitHub Actions works out of the box on a forked repository and can be easily configured to run a test workflow each time a commit is issued. This would help individual contributors test their code and ensure code quality before submitting a pull request against the repository.
Transparency
Current CI providers such as CircleCI and Travis allow anyone to view the console output when building and running tests but the test results can not be seen anywhere on the GitHub repository. To view this testing output: You need go to a different website, navigate a different user interface, and then sift through thousands of lines of console output. This is not a seamless developer experience.
In contrast, using GitHub Actions would provide all testing output directly on the repository’s GitHub page, which would help contributors to find, read, and use the test output to maintain code quality.
Control
GitHub Actions’ integration with other GitHub features means you can have finer control over the CI pipeline. For example, certain workflows can be set to only run on a new release. Workflows can even be used to close stale issues and pull requests.
Recommendation
I recommend that we consider using one consistent CI provider, GitHub Actions, which provides an integrated and seamless developer experience for all contributors.
Example
Please see this example that the C++ repository has adopted for the above reasons.
Thank you for analysis. Last this question was discussed we intentionally kept the decision of CI pipeline of choice to maintainers of an individual repositories. And giving all advantages you mentioned, there may be enough incentive for maintainers to switch. If not - I'm not sure how we can compare the benefits of making it easy for newcomer to contribute to many repositories with potential issues and overhead for maintainers with switching to GitHub Actions.
If you want to take lead on helping individual repositories with this switch - it will be great.
Also, for the standardization effort I'd advocate for even increased transparency and suggest we make builds fully containerized. This will make builds even more transparent and easier to try locally. It also will ensure that anybody can release a version without the dependency on github.
For Java, we get automatic Javadoc publishing via javadoc.io. https://www.javadoc.io/doc/io.opentelemetry I'm not sure if the docs-deployment was another thing, though.
For Java, we get automatic Javadoc publishing via javadoc.io. https://www.javadoc.io/doc/io.opentelemetry I'm not sure if the docs-deployment was another thing, though.
Good catch! Updated.
hey @Brandon-Kimberly! in the Java Instrumentation repository, we are currently using CircleCI's xlarge
instances (8 cores, 16 GB). I tried super unsuccessfully a while back to fit into the CircleCI free tier (2 GB), but I think it's much more likely we could fit into the Github Actions runner (2 cores, 7 GB). We would need to reduce parallelism within each job, which would likely bump build time from ~20 min to ~1 hour, so we'll probably also need to split out into more parallel jobs in order to keep the build time reasonable (well, i don't want to imply ~20 min build time is reasonable, but at least not worse than that?)
also, it's news to me that i'm the mentor for this project, i don't object though 😄
We are pretty heavily invested in CircleCI in the Collector. Unless someone volunteers to migrate all of it then moving to Github actions or anywhere else is going to be problematic.
I agree that CircleCI seems to be pretty broadly supported/common for Go projects, and thus that we as the Go developers would probably prefer to use Circle, and we've heard above that the Collector wants to remain on CircleCI as well...
Updated list: .NET repo migrated from Azure pipelines to Github actions.
I'm closing this issue. Please re-open if there are more arguments on pushing for aligning CI tools
I feel like we need more discussion on this. Maybe not all projects will be able to migrate to GH actions, but some could? Maybe @trask still wants to go that way.
In any case, we should make it clear that we have a guidelines overall, and the conclusions from this ticket could help.
PS - @trask no worries, you won't be a mentor, we only needed your feedback 😃 (unless you have free cycles to mentor this).
what kind of conclusion do you feel would be useful beyond declaring that it's on maintainers discretion, but we recommend GitHub Actions.
BTW, as I mentioned in comment above, do we want to push for fully dockerized build definitions?
beyond declaring that it's on maintainers discretion, but we recommend GitHub Actions.
I think this would be a good thing, yes (in case we reach an agreement). In this case, we would get a little bit of uniformity (hopefully we will only be using GH Actions + CircleCI).
we only needed your feedback
I started converting Java Instrumentation repo to Github Actions.
In Github Actions we need to parallelize across lots of jobs to get similar performance that we were getting from CircleCI, where we are both parallelizing across jobs and within jobs (by using larger hardware on paid plan).
But Github Actions has max 20 parallel builds across the whole org (we can bump that up with a paid plan).
Also, there's not a configuration to auto-cancel old builds when you push updates to a PR, which ends up really clogging those 20 parallel builds for a long time when people append PRs (which seems to happen fairly often, given that our builds take a long time in the first place). There are some custom Github Actions to auto cancel old builds (https://github.com/marketplace?type=actions&query=cancel) but they rely on getting a chance to run, which they don't when your whole build queue is clogged.
We had a brief chat with @trask regarding the CI tools for OpenTelemetry, and I would definitely suggest migrating to the GitHub Actions.
CNCF has already established agreements and generous plans with Azure Pipelines and GitHub Actions, and we already have a good experience of other CNCF project migration to these solutions.
But Github Actions has max 20 parallel builds across the whole org (we can bump that up with a paid plan).
Eg, this is not an issue with the GHA offering for CNCF.
Great to hear that the GHA limitations don't apply to the OT account. @Brandon-Kimberly can you post an update on the coverage we've completed for various repos :-)
This is the current landscape of CI/CD in OpenTelemetry:
Repository | CI/CD Provider | Using GitHub Actions as primary CI? | Migrated In Last 6 Weeks? |
---|---|---|---|
Collector | CircleCI | ❌ | |
Python | CircleCI/GHA | ❌ | |
JS | CircleCI/GHA | ❌ | |
C++ | GHA | ✅ | ✅ |
.NET | Azure/GHA | ✅ | ✅ |
Go | CircleCI | ❌ | |
PHP | GHA | ✅ | ✅ |
Java | CircleCI | ❌ | |
Rust | GHA | ✅ | ✅ |
Swift | GHA/Scope | ✅ | |
Ruby | GHA | ✅ | ✅ |
Erlang | CircleCI/GHA | ✅ | ✅ |
Java-Instr | CircleCI | ❌ |
Hey @Brandon-Kimberly, can you add the Java Instrumentation repo to your table (not that it's green, but so we track it)? thx!
We had a brief chat with @trask regarding the CI tools for OpenTelemetry, and I would definitely suggest migrating to the GitHub Actions.
CNCF has already established agreements and generous plans with Azure Pipelines and GitHub Actions, and we already have a good experience of other CNCF project migration to these solutions.
But Github Actions has max 20 parallel builds across the whole org (we can bump that up with a paid plan).
Eg, this is not an issue with the GHA offering for CNCF.
What would be the actual actual limit to max parallel builds? is it 60? or unlimited?
Right now we are on the Team subscription with the limit of 60 concurrent jobs. We received it by asking GitHub directly. @idvoretskyi if there is an agreement between CNCF and GitHub that enables bigger limit - we will definitely benefit from it as there are simply too many active groups working on different language SDKs and other components.
@SergeyKanzhelev Should not be an issue from the billing standpoint, but let me double-check if there are any technical limitations on the GHA side.
All the go repos have automatically docs because of godoc
Should not be an issue from the billing standpoint, but let me double-check if there are any technical limitations on the GHA side.
Any news on this, @idvoretskyi ?
Should not be an issue from the billing standpoint, but let me double-check if there are any technical limitations on the GHA side.
Any news on this, @idvoretskyi ?
Current status is that at the moment we are on teams subscription and watching if this would not be enough. If we will hit the limit, we will continue conversation. Is there any concerns regarding the current number of jobs or anticipated problems that blocking something?
Not any immediate problems, but I am thinking if it is a good idea to bring a massively parallel job to Java instrumentation repo.
@iNikem we are in process of discovering expanded options, but as @SergeyKanzhelev mentioned, it's not an immediate need AFAIK.
Not any immediate problems, but I am thinking if it is a good idea to bring a massively parallel job to Java instrumentation repo.
If there are some existing known numbers that wouldn't fit to the current limits and blocking migration - let's discuss.
Rust currently has code coverage as well as automated docs deployment
@Brandon-Kimberly, can you open the similar "Proposal: Use GitHub Actions for CI/CD" issue in https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation repo? Or alternatively can I copy-paste the text of your issues and do it myself? :)
@Brandon-Kimberly, can you open the similar "Proposal: Use GitHub Actions for CI/CD" issue in https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation repo? Or alternatively can I copy-paste the text of your issues and do it myself? :)
@Brandon-Kimberly, scratch that, they beat me to it :)
I think this can be closed as a great success! Thanks everyone who contributed to the effort of converting to GitHub Actions!