community icon indicating copy to clipboard operation
community copied to clipboard

Download URLs for opentelemetry artifacts

Open svrnm opened this issue 1 year ago • 21 comments

While many language SDKs are installed via their respective package managers, we have a set of projects that produce artifacts that are downloaded by end-users via GitHub. Some of them are

  • OpenTelemetry Collector (core + contrib)
  • OpenTelemetry Collector Builder (ocb)
  • OpenTelemetry Java Agent
  • OpenTelemetry .NET Autoinstrumentation

Right now those artifacts are served via GitHub and end users need to pull them from URLs like

https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.95.0/ocb_0.95.0_linux_amd64

Those URLs have 2 issues:

  • In docs (and probably some other places) this leads to code blocks that are hard to read/require unnecessary line breaks.
  • We can not get centralized insights on how often which artifact has been downloaded

As proposed by @austinlparker and discussed in https://github.com/open-telemetry/opentelemetry.io/issues/4079 we would like to give scarf.sh a try, which can turn the URL above into something like

https://get.opentelemetry.io/ocb_0.95.0_linux_amd64

I raise this community issue, because to do so I would need some support from different SIGs:

  • @open-telemetry/governance-committee & @open-telemetry/technical-committee to take a look if this is a fit for our community (note that scarf is vetted by LF, see The Linux Foundation is Partnering With Scarf for OSS Usage Analytics
  • @open-telemetry/sig-security-maintainers to review scarf and see if there are any security concerns we need to get out of the way (or if there are any blockers)
  • @open-telemetry/collector-maintainers, @open-telemetry/java-instrumentation-maintainers, @open-telemetry/dotnet-instrumentation-maintainers to take a look if they are OK with that for their artifacts

I can and will create issues in SIGs repositories as needed.


Notes:

  • Scarf can also be used for docker images, e.g. fluent is using that already: https://docs.fluentbit.io/manual/installation/docker
  • For the "shorter urls" we can implement something in the docs repository as well, but this would come without analytics and with a lot more maintanance and setup effort.

svrnm avatar Mar 05 '24 19:03 svrnm

I was finally able to get to this. All in all, I'm happy with Scarf, but there's one thing I would recommend before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily. This could be something for the SIG Tooling to work on.

Here are some notes for reference:

  • Scarf will issue redirects for file downloads, like the ocb example given by @svrnm
  • Scarf will act as a reverse proxy for container images
  • Scarf claims to respect Do Not Track headers

I believe our configuration on scarf.sh has changed so that the correct URL to download the latest ocb would be:

https://get.opentelemetry.io/0.105.0/linux/amd64/ocb

And it resulted in the following redirect:

< HTTP/2 302 
< date: Thu, 18 Jul 2024 11:52:47 GMT
< location: https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.105.0/ocb_0.105.0_linux_amd64
< strict-transport-security: max-age=15724800; includeSubDomains

jpkrohling avatar Jul 18 '24 12:07 jpkrohling

And a personal request: if we decide to use it for container images as well, can we use "cr" as the subdomain, instead of docker? Docker is one specific technology (and company), while "cr" is "container registry", as used elsewhere as well.

jpkrohling avatar Jul 18 '24 12:07 jpkrohling

Yeah, we could make it whatever. download.opentelemetry.io? packages.opentelemetry.io?

austinlparker avatar Jul 18 '24 12:07 austinlparker

I like get.opentelemetry.io for the files, and cr.opentelemetry.io (or containers.opentelemetry.io) for containers, as we might have other packages in the future (npm, for instance).

jpkrohling avatar Jul 18 '24 12:07 jpkrohling

before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily.

I thought about that potential plan B for a little bit, here is a proposal (and I would like @chalin to also take a look): we use the website (specifically netlify) by writing redirects into the netlify.toml, e.g.

[[redirects]]
from = "https://get.opentelemetry.io/:version/:os/:arch/ocb"
to = "https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv:version/ocb_:version_:os_:arch"

This provides a very similar functionality to scarf (minus the analytics) functionality.

svrnm avatar Jul 22 '24 13:07 svrnm

A few thoughts:

  • If we're going to do that, why not make this plan B our plan A? I'd rather not have to introduce another (analytics++) tool if we can avoid it.
  • Also, does it need to be a subdomain? Why not use, for example, https://opentelemetry.io/download/:version/:os/:arch/ocb
  • Btw, I'd rather that the redirects be programmed via the _redirects file, rather than the Netlify config file.

If y'all agree, then we could incrementally implement this Netlify-based redirects approach, without a need for a fallback plan B. WDYT?

chalin avatar Jul 23 '24 23:07 chalin

@chalin, good point! I think one reason for having scarf.sh is exactly the analytics part. For me the short URLs are the main reason to have a solution

svrnm avatar Jul 24 '24 13:07 svrnm

So you want to switch from GA4 to Scarf.sh for analytics? (If so, maybe we can move that discussion to another thread?) Does anyone have enough experience with the use of Scarf.sh for the purpose of analytics? (I'll ask internally.)

chalin avatar Jul 24 '24 22:07 chalin

No, this is not about switching from ga4 to scarf.sh, but in that particular use case ga4 is not going to track anything, since these download URLs do not result in any HTML being downloaded and JavaScript being executed for that matter.

We probably could use netlify logs or something as an alternative, but if analytics of downloads is important to us, scarf.sh (since it is LF/CNCF "approved") is the easist thing to do.

svrnm avatar Jul 25 '24 10:07 svrnm

Following up on this, netlify has analytics capabilities via server side logs, which if we go with the redirect option probably provides similar functionality: https://docs.netlify.com/monitor-sites/site-analytics/

svrnm avatar Aug 12 '24 13:08 svrnm

Note that Scarf would also proxy the container images. During my review, I saw that they don't do a simple redirect of the container images, but rather, have a proper proxy in place especially to handle the authentication. That's the reason I suggested a Go application serving as proxy. For the cases where scarf issues a redirect, plain redirects at netlify would certainly work.

jpkrohling avatar Aug 13 '24 10:08 jpkrohling

Note that netlify is able to do redirects as well, I used them for the go.opentelemetry.io prototype:

https://docs.netlify.com/routing/redirects/

I was not aware that a proxy is needed for docker images (I assume there is a reason why they do that). This of course raises the question about required capacity. I could imagine this is quickly going into some 100GBs?

svrnm avatar Aug 13 '24 13:08 svrnm

This of course raises the question about required capacity

They have a page explaining that, but it's related to how auth works for Docker's registry.

When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.

https://docs.scarf.sh/gateway/#how-it-works

jpkrohling avatar Aug 13 '24 14:08 jpkrohling

This of course raises the question about required capacity

They have a page explaining that, but it's related to how auth works for Docker's registry.

When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.

This is the one compelling reason for scarf, they figured that part out and probably also make sure that this works with registries across the board, while this would be our own responsibility if we go through hugo+netlify.

svrnm avatar Aug 14 '24 12:08 svrnm