Download URLs for opentelemetry artifacts
While many language SDKs are installed via their respective package managers, we have a set of projects that produce artifacts that are downloaded by end-users via GitHub. Some of them are
- OpenTelemetry Collector (core + contrib)
- OpenTelemetry Collector Builder (ocb)
- OpenTelemetry Java Agent
- OpenTelemetry .NET Autoinstrumentation
Right now those artifacts are served via GitHub and end users need to pull them from URLs like
https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.95.0/ocb_0.95.0_linux_amd64
Those URLs have 2 issues:
- In docs (and probably some other places) this leads to code blocks that are hard to read/require unnecessary line breaks.
- We can not get centralized insights on how often which artifact has been downloaded
As proposed by @austinlparker and discussed in https://github.com/open-telemetry/opentelemetry.io/issues/4079 we would like to give scarf.sh a try, which can turn the URL above into something like
https://get.opentelemetry.io/ocb_0.95.0_linux_amd64
I raise this community issue, because to do so I would need some support from different SIGs:
- @open-telemetry/governance-committee & @open-telemetry/technical-committee to take a look if this is a fit for our community (note that scarf is vetted by LF, see The Linux Foundation is Partnering With Scarf for OSS Usage Analytics
- @open-telemetry/sig-security-maintainers to review scarf and see if there are any security concerns we need to get out of the way (or if there are any blockers)
- @open-telemetry/collector-maintainers, @open-telemetry/java-instrumentation-maintainers, @open-telemetry/dotnet-instrumentation-maintainers to take a look if they are OK with that for their artifacts
I can and will create issues in SIGs repositories as needed.
Notes:
- Scarf can also be used for docker images, e.g. fluent is using that already: https://docs.fluentbit.io/manual/installation/docker
- For the "shorter urls" we can implement something in the docs repository as well, but this would come without analytics and with a lot more maintanance and setup effort.
I was finally able to get to this. All in all, I'm happy with Scarf, but there's one thing I would recommend before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily. This could be something for the SIG Tooling to work on.
Here are some notes for reference:
- Scarf will issue redirects for file downloads, like the
ocbexample given by @svrnm - Scarf will act as a reverse proxy for container images
- Scarf claims to respect Do Not Track headers
I believe our configuration on scarf.sh has changed so that the correct URL to download the latest ocb would be:
https://get.opentelemetry.io/0.105.0/linux/amd64/ocb
And it resulted in the following redirect:
< HTTP/2 302
< date: Thu, 18 Jul 2024 11:52:47 GMT
< location: https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.105.0/ocb_0.105.0_linux_amd64
< strict-transport-security: max-age=15724800; includeSubDomains
And a personal request: if we decide to use it for container images as well, can we use "cr" as the subdomain, instead of docker? Docker is one specific technology (and company), while "cr" is "container registry", as used elsewhere as well.
Yeah, we could make it whatever. download.opentelemetry.io? packages.opentelemetry.io?
I like get.opentelemetry.io for the files, and cr.opentelemetry.io (or containers.opentelemetry.io) for containers, as we might have other packages in the future (npm, for instance).
before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily.
I thought about that potential plan B for a little bit, here is a proposal (and I would like @chalin to also take a look): we use the website (specifically netlify) by writing redirects into the netlify.toml, e.g.
[[redirects]]
from = "https://get.opentelemetry.io/:version/:os/:arch/ocb"
to = "https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv:version/ocb_:version_:os_:arch"
This provides a very similar functionality to scarf (minus the analytics) functionality.
A few thoughts:
- If we're going to do that, why not make this plan B our plan A? I'd rather not have to introduce another (analytics++) tool if we can avoid it.
- Also, does it need to be a subdomain? Why not use, for example,
https://opentelemetry.io/download/:version/:os/:arch/ocb - Btw, I'd rather that the redirects be programmed via the
_redirectsfile, rather than the Netlify config file.
If y'all agree, then we could incrementally implement this Netlify-based redirects approach, without a need for a fallback plan B. WDYT?
@chalin, good point! I think one reason for having scarf.sh is exactly the analytics part. For me the short URLs are the main reason to have a solution
So you want to switch from GA4 to Scarf.sh for analytics? (If so, maybe we can move that discussion to another thread?) Does anyone have enough experience with the use of Scarf.sh for the purpose of analytics? (I'll ask internally.)
No, this is not about switching from ga4 to scarf.sh, but in that particular use case ga4 is not going to track anything, since these download URLs do not result in any HTML being downloaded and JavaScript being executed for that matter.
We probably could use netlify logs or something as an alternative, but if analytics of downloads is important to us, scarf.sh (since it is LF/CNCF "approved") is the easist thing to do.
Following up on this, netlify has analytics capabilities via server side logs, which if we go with the redirect option probably provides similar functionality: https://docs.netlify.com/monitor-sites/site-analytics/
Note that Scarf would also proxy the container images. During my review, I saw that they don't do a simple redirect of the container images, but rather, have a proper proxy in place especially to handle the authentication. That's the reason I suggested a Go application serving as proxy. For the cases where scarf issues a redirect, plain redirects at netlify would certainly work.
Note that netlify is able to do redirects as well, I used them for the go.opentelemetry.io prototype:
https://docs.netlify.com/routing/redirects/
I was not aware that a proxy is needed for docker images (I assume there is a reason why they do that). This of course raises the question about required capacity. I could imagine this is quickly going into some 100GBs?
This of course raises the question about required capacity
They have a page explaining that, but it's related to how auth works for Docker's registry.
When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.
https://docs.scarf.sh/gateway/#how-it-works
This of course raises the question about required capacity
They have a page explaining that, but it's related to how auth works for Docker's registry.
When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.
This is the one compelling reason for scarf, they figured that part out and probably also make sure that this works with registries across the board, while this would be our own responsibility if we go through hugo+netlify.