community
community copied to clipboard
PROPOSAL: Transfer ownership of go.opentelemetry.io to OpenTelemetry
tl:dr: All requests for go.opentelemetry.io currently route through an abandoned project that only Google employees have access to. This outlines a plan to transfer ownership of this project to the OpenTelemetry community.
Background
The app which routes requests for go.opentelemetry.io is owned by Google in a GCP project under the google.com organization. This app is very outdated and is not actively maintained by anyone at Google.
Any request for go.opentelemetry.io (or a path, such as go.opentelemetry.io/otel) is served by this app. This includes all imports in any Go project, for example go get (when not using a cached version of the dependency or a custom proxy). Note that this also affects Collector (+Contrib) imports as well.
The app is a deployment of https://github.com/GoogleCloudPlatform/govanityurls, which was last updated in March 2020. This app is essentially a proxy, where any request for the "vanity URL" is routed to an actual Github/pkg.go.dev path and the response is returned. This project is also not actively maintained by anyone at Google.
OpenTelemetry uses govanityurls in https://github.com/open-telemetry/opentelemetry-go-vanityurls/. This repo contains a config file for the vanity routes and a deploy script.
The deploy script is run as a postsubmit CircleCI job that deploys the upstream govanityurls project from HEAD to the opentelemetry-go GCP project under google.com. The app then serves all requests for go.opentelemetry.io imports by default, with over ~20,000 requests/day.
The limited ownership and maintenance of this project creates a critical single point of failure for the OpenTelemetry Go ecosystem.
Goals
This plan has the following goals:
- Immediately stabilize the app and transfer
go.opentelemetry.ioto the community's control. - Establish a governance plan that facilitates this transfer and allows the community to define its own secure long-term access.
- Allow the community to openly discuss future plans for this piece of infrastructure at its own pace.
Non-goals
This plan does not include the following:
- Debate over the optimal high-level long-term solution. The ownership of this project is an immediate concern that can and should be addressed quickly. Following this transition, the community is free to explore alternative designs for handling
go.opentelemetry.iorequests in more detail.
Migration plan
This plan includes the following steps:
- Fork https://github.com/GoogleCloudPlatform/govanityurls to an OpenTelemetry repo.
- Create an OpenTelemetry-owned cloud project to host the app.
- Update https://github.com/open-telemetry/opentelemetry-go-vanityurls/ to deploy the app to the new cloud project.
- Update DNS records for go.opentelemetry.io to point to the new cloud project.
Step 1: Fork GoogleCloudPlatform/govanityurls
Who: TC (to create the repo), @damemi and @MrAlias to update dependencies)
Because the upstream project running this app is no longer maintained, the Otel community needs a fork that we have merge permissions on. We will create that fork (ie, github.com/open-telemetry/govanityurls). We will then update the dependencies in that fork to the latest available releases.
Step 2: Create an OpenTelemetry-owned cloud project
Who: TC (to create project), @damemi and @MrAlias if necessary
Currently, the config repo at https://github.com/open-telemetry/opentelemetry-go-vanityurls uses GCP to deploy changes to the app to an App Engine service. It will be easiest for OpenTelemetry to create a new GCP project and migrate the existing project to the new one.
Step 3: Update open-telemetry/opentelemetry-go-vanityurls
Who: @damemi, @MrAlias
The config repo will need to point to the new GCP project. This should be a one-line change in the deploy script. However, there may be additional project settings to link this GitHub repo to the new project via a service account.
Step 4: Update DNS records for go.opentelemetry.io
Who: TC
This is the final step for this change to take full effect. While it should transition with zero downtime, it should be communicated to the community in the event of any potential downtime.
We should first confirm that the current DNS TTL for go.opentelemetry.io is set to a reasonable amount that will allow this switch to happen in a timely manner.
Next, the DNS records will be updated to point to the new GCP project/service. As the DNS records propagate, requests will automatically route to the new service with minimal, if any, downtime expected.
Cleanup
When traffic to the old service drops to 0, the old GCP project will be deleted.
Governance
The maintainers of the new GitHub project will be:
- @open-telemetry/go-maintainers
- @open-telemetry/go-instrumentation-maintainers
- @open-telemetry/collector-maintainers
- @open-telemetry/collector-contrib-maintainers
The maintainers of the new Cloud project will be:
- @open-telemetry/technical-committee
During the transition, temporary additional access may need to be granted to non-TC members who need to set up and verify the new project (myself and @MrAlias have both volunteered).
Timeline
This change is high priority due to the central importance of go.opentelemetry.io to the OpenTelemetry Go ecosystem. Work on this transition should begin immediately and complete as soon as possible.
Future work
As part of our contributions to OpenTelemetry, the Google team is willing to assist with any work needed for this transition.
Following the transition, the OpenTelemetry community is free to treat this like any other part of infrastructure, including assessing long-term solutions and allocating maintainers.
One such task would be evaluating the use of govanityurls vs alternative proxies for this kind of request handling. The community may also choose to evaluate alternative deployment methods or cloud providers altogether. Neither of these are in scope for this transition plan, which has the top priority of putting the project into a stable state, as soon as possible, with as few changes as possible.
cc @jsuereth @dashpole
Overall seems fine to me, although I'd advocate for us to migrate over to our Hugo + Netlify setup as a fast follow.
I've opened https://cncfservicedesk.atlassian.net/servicedesk/customer/portal/1/CNCFSD-2407 to request a GCP account for this purpose.
Overall seems fine to me, although I'd advocate for us to migrate over to our Hugo + Netlify setup as a fast follow.
Yeah, this should get a good review once the immediate transfer is done
Should the new GitHub project also include @open-telemetry/go-approvers?
If I understand it correctly the vanity URLs are sourced from this file:
https://github.com/open-telemetry/opentelemetry-go-vanityurls/blob/f252f17172d141c5798465dacd6ca81d5700be5f/go.opentelemetry.io/vanity.yaml
This file had 7 updates in 5 years, so while having the workflow in place is a nice to have, it doesn't seem to be mandatory.
As @austinlparker suggested we can accomplish the same by leveraging hugo + netlify.
I am saying that because the whole completion of this issue depends on step 2 (Create an OpenTelemetry-owned cloud project), while step 1 is nice to have, it does not seem to be mandatory to me. Step 1, 3 and 4 are under our control entirely. If the creation of that community owned cloud project takes time, we can also look into the alternative already, either as fallback or as future solution.
@svrnm based on Austin's comment above, it sounds like creating a GCP project shouldn't take much time. The rest of the steps I've outlined let us drop-in the existing solution ASAP. This is a transition plan, not a long-term plan. We can absolutely look at alternatives once that's done.
apologies for my misleading comment, I fully support that we do the transition to a community-owned GCP instance first, I wanted to offer a fallback if the GCP project creation takes more time than expected, because I am worried that it is not a quick process, although it depends on what your expected timeline is (days? weeks?)
In that context, can you share some details on the sizing and properties of the instance needed, so we have it ready when we can set it up.
@svrnm no worries, sorry I misunderstood you. The app is running on an F1 AppEngine instance, that seems to be the only workload handling resource in the project. The service has a custom domain configured for go.opentelemetry.io and uses the default App Engine service account.
There is also a service account for CircleCi to deploy the app from github, which has the following IAM roles:
- App Engine Deployer
- App Engine Service Admin
- Cloud Build Editor
- Service Account User
- Storage Object Creator
- Storage Object Viewer
I think all of this should be easily movable to an OpenTelemetry GCP organization if we follow https://cloud.google.com/resource-manager/docs/moving-projects-folders#console. That depends on if there is an Otel GCP org. Otherwise, since it's currently associated with the google org, we can't change it back to "no organization" and will need an entirely new project with these settings.
I have a branch for the dependency updates to govanityurls here: https://github.com/damemi/govanityurls/tree/update-go-122. Once the Otel fork is up I'll open a PR to that repo.
I also kicked off the process of finding the owners of the GCP govanityurls repo. It seems abandoned at this point, if that's the case I'd like to mark it officially no longer maintained.
@svrnm based on Austin's comment above, it sounds like creating a GCP project shouldn't take much time. The rest of the steps I've outlined let us drop-in the existing solution ASAP. This is a transition plan, not a long-term plan. We can absolutely look at alternatives once that's done.
I do want to set expectations here appropriately; I believe it won't take too much time, but historically it can take a couple of weeks before we get our service desk tickets resolved. I have updated the ticket to indicate the urgency of this situation, but it's kinda out of our hands.
Thanks @austinlparker. If it is going to take a while I'm happy to help parallelize this by working on a hugo+netlify solution in the meantime. Worst case there is we get a head start on the fast follow.
My only issues with that are:
- I've never used either, so I will need a detailed doc/plan from someone else or time to learn how to set this up on my own (and this probably shouldn't be my first time running hugo+netlify)
- Changing the backend and the ownership simultaneously is a couple variables to add at once. I think it's safer for a critical piece of infra like this to ratchet the change one step at a time.
I've never used either, so I will need a detailed doc/plan from someone else or time to learn how to set this up on my own (and this probably shouldn't be my first time running hugo+netlify)
@open-telemetry/docs-maintainers, and especially @chalin have all the expertise needed for that, so don't worry :-) -- We can create a prototype as needed to showcase how it works.
But for now, let's give the ServiceDesk ticket some more time to be resolved.
Sorry for my stupid question, but isn't it just a matter of having a github repo named go.opentelemetry.io with a CNAME? Each directory (/otel) would then have an index.html with the appropriate go-import meta tag... Why do we need a GCP application somewhere in the first place?
Sorry for my stupid question, but isn't it just a matter of having a github repo named
go.opentelemetry.iowith a CNAME? Each directory (/otel) would then have an index.html with the appropriate go-import meta tag... Why do we need a GCP application somewhere in the first place?
For the same reasons the netlify+hugo solution is a fallback/alternative:
Changing the backend and the ownership simultaneously is a couple variables to add at once. I think it's safer for a critical piece of infra like this to ratchet the change one step at a time.
Note, that @chalin and I discussed a potential solution via netlify, and based on that I cobbled together a prototype, see https://github.com/open-telemetry/opentelemetry.io/pull/5022
Note, I also just checked the Service Desk issue, there is still no response.
For the same reasons the netlify+hugo solution is a fallback/alternative
Sorry, I meant github pages directly as opposed to Hugo+Netlify. IMO, being able to send a pull request to a repo named go.opentelemetry.io would be more intuitive, and we don't have that many repos to justify the added complexity.
For the same reasons the netlify+hugo solution is a fallback/alternative
Sorry, I meant github pages directly as opposed to Hugo+Netlify. IMO, being able to send a pull request to a repo named
go.opentelemetry.iowould be more intuitive, and we don't have that many repos to justify the added complexity.
Ah, ok, that's a viable option indeed! The vanity URLs are really simple, so a repo with some HTML files would probably do the trick
It would be good to go through the vanity url app's handler code to make sure everything is possible in a static site. I haven't dug too deep into it, but it seems weird to me that something like this exists in the first place if it can be done with just a proxy site. Maybe the purpose is just to be an abstracted solution?
I would strongly ask that we move forward with the original plan before we try to make too many changes at once.
The existing application serves builds for many projects that have high use and high impact (i.e. Kubernetes, Grafana, Prometheus, Moby). There are many more vendors that rely on this package site being up to continue business operations. Availability needs to be an important point in this discussion, both in the sort-term cut over, and in the long-term operational support.
This application is currently run on a platform with an SLA for update of >99.95%. We have engineers managing the project with direct lines of communication with the platform team, and the Go maintainers are familiar with the technology. These are all things that inspire confidence in the current design.
For alternate proposals, can you please provided an overview on the reliability that will be provided for the alternatives?
I would strongly ask that we move forward with the original plan before we try to make too many changes at once.
Main reason why we discuss alternatives is that the Service Desk issue takes some undefined time, and this is about having a fallback, if it takes too long (whatever that means). Until then going forward with the original plan is ... the plan.
It would be good to go through the vanity url app's handler code to make sure everything is possible in a static site. I haven't dug too deep into it, but it seems weird to me that something like this exists in the first place if it can be done with just a proxy site. Maybe the purpose is just to be an abstracted solution?
I had the same thought, but apparently it's very basic, see https://go.dev/ref/mod#serving-from-proxy:
When the go command downloads a module in direct mode, it first looks up the module server’s URL with an HTTP GET request based on the module path. It looks for a tag with the name go-import in the HTML response. The tag’s content must contain the repository root path, the version control system, and the URL, separated by spaces. See Finding a repository for a module path for details.
Same for the go-source based on this document.
The only "more complex" thing is in the refresh header and the body link, where the dynamic sub path gets included. But none of that is needed to make this work, this is pure convenience if someone hits that page with a browser. In hugo+netlify we need to do some redirect magic to replicate that.
Removing that what remains is
<meta name="go-import" content="{{.Import}} {{.VCS}} {{.Repo}}">
<meta name="go-source" content="{{.Import}} {{.Display}}">
which is all static
For alternate proposals, can you please provided an overview on the reliability that will be provided for the alternatives?
For netlify we are on the enterprise plan, which has a 99.99% uptime SLA (search for SLA on this page: https://www.netlify.com/pricing/),
Github seems to have a 99.9% (https://github.com/customer-terms/github-online-services-sla)
I had the same thought, but apparently it's very basic, see https://go.dev/ref/mod#serving-from-proxy
Interesting, maybe this project predates that? Would explain why it's been abandoned at least. FYI I'm poking around Google to find if there are any active maintainers for govanityurls and if not, I'm planning to mark the project as officially not maintained
These are all things that inspire confidence in the current design.
on the other hand, having to fork an abandoned project to keep it working doesn't inspire confidence 😅
The current script seems very generic with pieces that are not relevant for us (handling bit bucket repositories for subversion, mercurial, among others), all of that to serve a simple HTML with a couple of meta tags (the ones @svrnm linked above). We have only a few Go repositories, all of them hosted in GitHub: we could certainly have a few static HTML pages served via GitHub pages...
Hi all, quick update - we now have a GCP account for this. Please let me know who I need to add to the project in order to move this along.
Thanks @austinlparker, can you please add me ([email protected]) to the GCP project? @MrAlias also but I don't know which email he would like to use.
Can we also get the fork repo created under github.com/open-telemetry/govanityurls? Not sure who needs to do that (@svrnm @jpkrohling @jsuereth ?)
For the record, I 100% agree with the discussion around a better solution if I wasn't clear about that. Tyler mentioned to me offline that there could be some dns/scaling issues with the netlify proposal -- I'm not familiar enough to speak on that but he could explain more here. I am just in the camp that going one step at a time and migrating to a longer term solution is safer.
Otherwise, I'm ready to set up the GCP app now and happy to help with whatever alternatives we decide on too
@damemi I've added you as an owner to the project; Let me know if a lesser role is suitable.
@austinlparker thanks, just got it. I should definitely be removed before we switch the DNS settings
Ping, this is still blocked waiting on a new github repo
Hey @austinlparker thanks for getting the GCP project setup. Can you add me ([email protected]) to the project as well.
Opened https://github.com/open-telemetry/govanityurls/pull/1 to update the new fork repo to go 1.22
Next step will be testing the deployment in the GCP project
Then update https://github.com/open-telemetry/opentelemetry-go-vanityurls/
- Point to new github project
- Point to new GCP project
@MrAlias I was able to add you as an Owner (hope you don't mind, @austinlparker). Can you work on linking the existing CircleCI/github repo (https://github.com/open-telemetry/opentelemetry-go-vanityurls) to this project?