krew icon indicating copy to clipboard operation
krew copied to clipboard

Proposal: Supporting more artifact retreval methods

Open endzyme opened this issue 2 years ago • 18 comments

Related to #684 and #816


I am looking for some initial implementation feedback on supporting private GitHub repository releases as well as a few other potential artifact repositories.

The intent here is to support some way of running custom commands, which return an artifact file via stdout. This could solve #684 by running something like gh api -H Accept:application/octet-stream /repos/kubernetes-sigs/krew/releases/assets/55894121 to download artifacts from a private GitHub repo's release assets. It could also allow people to have their on private krew indexes which rely upon local machine tooling to get the artifacts. I'm thinking example commands like aws s3 cp s3://my-fancy-private-bucket/my-cool-krew-plugin.tgz -. This type of support could separate the concerns of auth to the local machine running the installation.

I initially wanted to implement this as an S3 SDK scheme but decided it would be better to enable more options via running commands on the local machine. Happy to discuss alternative approaches here too.

Early feedback request on the design

Keep in mind my testing and work is pretty rough and just an initial concept.

  1. Should this implementation bloat the uri spec key to support other "schemes" (such as file:// and cmd://) or should it really be a separate key in the plugin manifest spec? (Providing string like cmd://run-my-super-sweet-script -i thing -o otherthing just feels kinda not ideal.)

  2. Should this leverage temp files on the file system to "buffer" the stdout from the "artifact download command"? I initially attempted to use exec.Command() and exec.Command().Stdout() but hit issues with the pipe closing before the stream could be read (appears to be how exec.Command() is intended to function).

  3. Is this introducing too much of a security risk, allowing manifests to effectively execute arbitrary code via krew when installing plugins?

Below is an example plugin manifest using this PR to give an example of how this is written right now.

kind: Plugin
  name: testfile
  version: "v0.0.1"
  shortDescription: Throwin testfiles
  caveats: |
    Something something testfile

  - uri: cmd://gh api -H Accept:application/octet-stream /repos/kubernetes-sigs/krew/releases/assets/55894121
    sha256: 5df32eaa0e888a2566439c4ccb2ef3a3e6e89522f2f2126030171e2585585e4f
    bin: testfile
    - from: ./krew-linux_amd64
      to: testfile
      - key: "os"
        operator: "In"
        - linux
        - darwin

endzyme avatar Dec 02 '22 16:12 endzyme

@ahmetb Sorry for the rather lazy approach of starting the discussion by pasting the PR initial comment. If you'd like a different method of starting the design discussion, let me know what you expect and I can write up a proposal.

endzyme avatar Dec 02 '22 16:12 endzyme

@ahmetb Anything I can help with to get the ball rolling here on what you need to have a design discussion? Happy to connect in the Kubernetes Slack too. :smile: Happy Holidays and hope you have a great New Year

endzyme avatar Dec 29 '22 20:12 endzyme

i think points 1 and 3 you made are definitely important. i dont like the idea of executing an arbitrary command to get the release assets because its a security risk like you said.

i think homebrew has specific release strategies for downloading assets from non-public github urls (private github repo, s3, etc). maybe something like that could work here as well. speaking about private github specifically, i think if we kept the uri as a url to a private release asset then you have to add a Authorization: token ... header to the request, which presents another challenge of how krew should get that token. a predefined env var would be possible but that adds extra required knowledge on the users.

digging through the github api a bit and its a little difficult to download a release asset from a private repo. the url looks like and i dont see a way to download a release asset by name. i had to first get the release by name and parse out the asset id from that response. maybe there's an easier way though. if this is the way it has to be done then i think we would have to either:

  1. put the responsibility on the users to get the correct api url with asset id
  2. use the combination of owner, repo, and asset name to query the github api and download the asset (might involve changing the plugin schema which would not be great). another thing is if people are self hosting github then the api url will be different [1]

not a huge fan of either of those but option 2 would be a better user experience. another possibly annoying thing is that the access token thats used would have to have repo permissions in all the plugin repos for that custom index.

[1] another issue i ran into while playing around with this: my company uses self hosted github enterprise + okta sso. the typical url used for assets in plugin manifests (github-url/owner/repo/releases/download/...) doesnt actually work since that redirects to an okta page. figure id jot this down here since i think others might have similar issues when self hosting github. the plugins would need to specify the api url and the download would need to be updated

chriskim06 avatar Jan 10 '23 18:01 chriskim06

Most package managers I've encountered that handle this have allowed me to specify a url to the source. In the case of a git repo, it is just the git repo, and a ref.

source = {
  url = 'git+<owner>/<repo>.git',
  tag = 'v2.2.1',

where the protocol of the url tells the manager how it is supposed to get the asset.


tells you the asset lives on a github release for example, which means the repo could be private and needs some additional information. Most github things have standardized around having a GITHUB_TOKEN environment variable set on the machine running the command to handle authenticated requests.

esatterwhite avatar Feb 10 '23 15:02 esatterwhite

@esatterwhite, do you have any examples you can link here. I'm happy to keep driving this forward a little, but I want to see if the examples have use cases for other schemes or protocols like s3/gcs or others. Thanks!

endzyme avatar Feb 12 '23 00:02 endzyme

dug around homebrew a bit and they are also using the github api with the token defined in an env var i think. im not exactly sure how a formula defines that the artifact to install is in a private repo though (and homebrew also allows people to define how something is installed instead of just providing a binary to download).

i think if we are going to support this we'll need to use the api as well and come up with a good design around some of the comments i made earlier

chriskim06 avatar Feb 12 '23 01:02 chriskim06


esatterwhite avatar Feb 12 '23 16:02 esatterwhite

@esatterwhite that looks like stuff to download a private repo (which i believe works with the custom indexes feature in krew). the problem we have right now is if a plugin manifest defines a binary from a release in a private repo. i havent figured out a way to download an asset using the{owner}/{repo}/releases/download/{tag}/{artifact} url. i think we need the github api in order to do that which would require some changes to the plugin schema i think and that requires some design

chriskim06 avatar Feb 12 '23 17:02 chriskim06

i think the main points of concern are

  • making sure that existing manifests dont need to be updated
  • doing this in a way that is extensible to other potential artifact sources
  • how to handle authentication credentials

with that being said here are some very preliminary thoughts i had:

we could add a new Release type that implements the download.Fetcher interface and include this in the Plugin type. Release would need to implement the UnmarshalJSON method and use json.RawMessage to allow unmarshaling to different types (in case we support other download options like s3 in the future). a release would look like

- selector:
    type: github
      baseURL: ...
      repo: ...
      release: ...
      asset: ...

the implementation of these different ways to download releases would just need to implement Get(string) (io.ReadCloser, error), could have different schemas themselves, and would need to update the Release type's implementation of UnmarshalJSON.

the rough idea above doesnt address how the github token is picked up by krew. that is probably one of the more difficult points to address with all this. we've talked about potentially having a config file in the past but we havent really needed one afaik. an env var is much simpler from a dev perspective but adds a bit of extra required knowledge on the user's part (although i dont think much and is a common paradigm with other tools).

@ahmetb curious to hear your take on all this.

chriskim06 avatar Feb 16 '23 18:02 chriskim06

I think this is a fine approach. In terms of a new release type, it could easily be adapted into a plugin ecosystem if needed. The plugin docs for different doc types could either warn users of missing env vars or other things.

My only hesitation is having the top level spec called release, but maybe refer to it as artifact or installer or similar.

endzyme avatar Feb 18 '23 18:02 endzyme

I think it's probably better to go from concrete use cases to solutions in this scenario. Several use cases I'm hearing in this post are:

  1. I want to let my employees install kubectl plugins (from my private artifact storage): I'm not too convinced that we should try to accommodate this use case.

    Most companies have corporate IT tooling to install binaries on employee machines. Since most plugins are single Go binaries, they should use their existing corp machine management tooling to deliver and update these binaries and not need Krew.

  2. I want to share my plugin from a private artifact repository (superset of 1): This comes down to where the binary is and how it is authenticated. It might be OK to roughly categorize the artifact sources as:

    • Blob is not served over HTTP(S), e.g. ftp or custom curl command: inclined to not accommodate this
    • Blob is on a private HTTPS endpoint with "custom" authentication scheme: For example, the "signed blob" URLs seen in S3 or Azure. This means we need to reformat the URL, it's not ideal.
    • Blob is authenticated via an Authorization header: e.g. AWS S3 can use an "Authorization: AWS xxx" scheme, but it
    • Blob is authenticated via an Authorization: Bearer xxx header: (What kubectl auth exec plugins do): it can accommodate endpoints like Google Cloud Storage or GitHub API.

By far, the last format (Bearer token) is the easiest to support and may be covering the most ground [citation needed] but as soon as we add this, we'll probably get someone from a company saying they use Kerberos/NTLM for authentication (which requires roundtripping the http requests with a sophisticated negotiation protocol like SPNEGO/GSSAPI) and we'd be pulled in direction (and we probably should have a hard line to not support such requests).

Also keep in mind, if a company can deliver something like an auth plugin that's exec'ed on a machine, they might as well deliver the kubectl plugin with the same mechanism, too. :) So this would be only interesting if the user already has CLI tools like gh or gcloud where we can call gh auth token or gcloud auth print-access-token and get a token out of it.

ahmetb avatar Feb 19 '23 00:02 ahmetb

We want to set up our own krew pugin index in our GitHub org, which is private and pulls artifacts off of private GitHub releases. Very similar to what krew is doing already. That is the use case

esatterwhite avatar Feb 19 '23 02:02 esatterwhite

kubectl provides an elegant albeit complex method for authentication => kubelogin. In fact, krew can install kubelogin plugins (see oidc-login. If krew took a similar approach, the user experience might look like

$ kubectl krew index add myindex
WARNING: You have added a new index from ""
The plugins in this index are not audited for security by the Krew maintainers.
Install them at your own risk.

Some of the artifacts in this index require the 'krew-download-github'. The plugin was not found.

Then the user can install that plugin. Most of these can be public.

$ kubectl krew install krew-download-github

When a kubectl plugin download requires a download extension, krew executes the plugin. A zero exit code would mean that the data written to stdout is the complete artifact. If a non-zero exit code would mean the download failed, the output over stderr could give the user more insight into the issue.

The above experience would ensure that all plugins are verified via sha256 hashes. Social evidence also shows that executing subcommands is acceptable (see kubelogin). Giving arguments to the plugins (defined in the index) would also not expand the security surface area as long as a krew does not interpret the arguments in a shell. In kubeconfig, the commands are defined by an array of strings and not passed to a shell but to exec.Command.

The above approach would cover downloading artifacts FROM ANY source over any protocol provided that the end user is willing to install kubectl plugins that enable those plugins. Presumably, they are already comfortable with that, or they would not be using krew in the first place. Concerns about corporate policies involving the installation of software are an orthogonal concern. Installing krew plugins is no different from installing kubectl, chrome, or any other executable on a corporate asset. Those concerns should be deferred to Device Management software solutions and not directly affect the solutions presented here.

daniel-garcia avatar Feb 20 '23 18:02 daniel-garcia

On a related note, its not obvious this is an issue since krew interprets 4xx as successful artifact downloads. This can be a real time sink because of the misleading error. #819

daniel-garcia avatar Feb 20 '23 18:02 daniel-garcia

  1. I want to share my plugin from a private artifact repository

i think this is something we should support, especially for private github repos. it should be doable to implement this in a way where we can add/deny new artifact sources on a case by case basis

chriskim06 avatar Feb 23 '23 18:02 chriskim06

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 24 '23 18:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 23 '23 19:06 k8s-triage-robot

Has there been any forward progress on this? I know there were a couple pull requests, but they seem to have stalled. Are we will waiting on agreement around design?

esatterwhite avatar Jul 11 '23 14:07 esatterwhite