scmrepo icon indicating copy to clipboard operation
scmrepo copied to clipboard

support `http.extraheader`

Open pmrowla opened this issue 2 years ago • 24 comments

https://github.com/jelmer/dulwich/issues/882

in libgit2/pygit2 this requires exposing git_fetch_options.extra_headers in the pygit remote callbacks

pmrowla avatar Jul 29 '23 01:07 pmrowla

@pmrowla What is the level of effort to support http.extraheader? Thinking about this some more, it would be nice to have that support because I think we increasingly have valid CI use cases that don't require CML at all, and avoiding both cml ci and iterative/setup-cml would be preferable.

dberenbaum avatar Jul 31 '23 14:07 dberenbaum

@dberenbaum I think it would probably take me about a full sprint. One thing to note here is that the git http. config options also support a lot of ways to do url substitution which also get used in a lot of the CI platforms and I'm not sure if any of them are actually supported in either dulwich or libgit2

pmrowla avatar Aug 01 '23 03:08 pmrowla

Discussed that this isn't a priority for now given the level of effort and ability to handle this in iterative/setup-cml and/or iterative/setup-dvc (and in cml ci)

dberenbaum avatar Aug 01 '23 15:08 dberenbaum

For the record we hit the same issue with gto here: https://github.com/iterative/gto/issues/277

Also, AFAIU cml ci requires REPO_TOKEN to be set which is an extra items - security issues, and overall requires some setup.

shcheklein avatar Sep 14 '23 23:09 shcheklein

@shcheklein Can we recommend to include setup-dvc and/or setup-cml when using our tools inside GHA?

dberenbaum avatar Sep 15 '23 17:09 dberenbaum

@dacbd Do you know what exactly are the requirements to include in a GHA workflow in order to push tags via gto?

dberenbaum avatar Sep 15 '23 17:09 dberenbaum

@shcheklein Can we recommend to include setup-dvc and/or setup-cml when using our tools inside GHA?

we probably can. But it also means we'll need to do the same with gto for example which becomes a bit heavy and unexpected (at least by its name - setup-cml, setup-dvc).

Also, by biggest concern that it requires and extra token to be set - REPO_TOKEN AFAIU.

shcheklein avatar Sep 15 '23 18:09 shcheklein

we probably can. But it also means we'll need to do the same with gto for example which becomes a bit heavy and unexpected (at least by its name - setup-cml, setup-dvc).

I can't find it ATM, but we had some discussion about whether we should consolidate to use a single action everywhere like setup-dvc even when using cml, gto, etc., or to have a setup-gto that would take care of this.

Also, by biggest concern that it requires and extra token to be set - REPO_TOKEN AFAIU.

@dacbd To confirm, is it required to include this?

        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}

@shcheklein Is your main concern about including this or managing the secret in GH? I think it only uses the auto-generated secrets.GITHUB_TOKEN, so it does not require setting up a secret manually AFAIK.

dberenbaum avatar Sep 15 '23 19:09 dberenbaum

I think it only uses the auto-generated secrets.GITHUB_TOKEN, so it does not require setting up a secret manually AFAIK.

If that's the case, it less important to address then. I was not sure (and I'm still not) about this. I tried to use it here https://github.com/shcheklein/PR-Merge-Register and the action failed with:

{"level":"error","message":"token not found","stack":"Error: token not found\n    at new Github (/usr/local/lib/node_modules/@dvcorg/cml/src/drivers/github.js:91:23)\n    at CML.getDriver (/usr/local/lib/node_modules/@dvcorg/cml/src/cml.js:161:35)\n    at CML.ci (/usr/local/lib/node_modules/@dvcorg/cml/src/cml.js:487:25)\n    at exports.handler (/usr/local/lib/node_modules/@dvcorg/cml/bin/cml/repo/prepare.js:13:13)\n    at /usr/local/lib/node_modules/@dvcorg/cml/node_modules/yargs/build/index.cjs:1:8993\n    at /usr/local/lib/node_modules/@dvcorg/cml/node_modules/yargs/build/index.cjs:1:4949"}

I haven't had to time dig further tbh so see if it's about token being required or something else.

shcheklein avatar Sep 15 '23 20:09 shcheklein

Can we just add the token rewriting patch to scmrepo if it detects its running in github actions? Until there is proper support for http.extraheader

I try and do a draft PR so you can see what it might look like.

@dacbd Do you know what exactly are the requirements to include in a GHA workflow in order to push tags via gto?

Just need to include a custom auth header when pushing git refs over https AUTHORIZATION: basic ***

dacbd avatar Sep 15 '23 20:09 dacbd

@dacbd

Can we just add the token rewriting patch to scmrepo if it detects its running in github actions?

will it be rewriting the token that GH provides?

shcheklein avatar Sep 15 '23 20:09 shcheklein

@shcheklein no, we just need to pull token out of the git config entry and add it in the git origin as a username/password.

approx: https://token:ghs_***@github.com/org/repo.git'

dacbd avatar Sep 15 '23 21:09 dacbd

@dacbd I guess makes sense to me. I would rely on @pmrowla judgement on this. Practically it's a patching it upstream vs implementing a GH-specific path on scmrepo (probably a temporary solution). My 2cs - if it's simple and quick it's fine to have it.

shcheklein avatar Sep 15 '23 21:09 shcheklein

sounds good, @pmrowla to save you a click on how did this.

https://github.com/iterative/setup-dvc/blob/d549770196ed130ba41697c4713281c349917d5e/src/utils.js#L43-L63

dacbd avatar Sep 15 '23 21:09 dacbd

I don't think a GHA specific patch belongs in scmrepo, iirc some other users encountered a similar issue with aws codecommit and whatever their CI equivalent is.

If this is a priority we should just fix it properly and add support for http.extraheader so it works everywhere.

pmrowla avatar Sep 16 '23 00:09 pmrowla

I don't think a GH specific patch belongs in scmrepo, iirc some other users encountered a similar issue with aws codecommit and whatever their CI equivalent is.

can it be generalized on the scmrepo level (e.g. if it's always about the same stuff as on GH)?

would it be faster compared to changing it upstream? Clearly it's the right way to do this, if it takes let's say 10x more time then it might make sense to consider alternatives.

shcheklein avatar Sep 16 '23 00:09 shcheklein

It can't be generalized at the scmrepo level since the headers have to be set in the actual HTTP requests (that are crafted entirely in the upstream git backends).

The GHA example would work purely in scmrepo since the way GHA uses it ends up just setting the HTTP basic Authorization header and you can accomplish the same thing by rewriting the entire git remote URL to fill in http://username:[email protected]/... (with the gha token instead as password)

But the extraheader really lets you set any HTTP header you want, so in the event a git forge uses it to set something other than Authorization, we can't accomplish it via rewriting the remote URL (and have to set the raw header in the HTTP request)

pmrowla avatar Sep 16 '23 00:09 pmrowla

Got it. Have you seen other (more or less meaningful, production) use case for the extraheader besides Auth?

shcheklein avatar Sep 16 '23 00:09 shcheklein

Azure pipelines CI uses it to set the Authorization header to a bearer token (so it sets Authorization: Bearer <token string>. (the URL username:password rewrite workaround only works for setting Authorization: Basic <base64 encoded username:password>)

pmrowla avatar Sep 16 '23 00:09 pmrowla

Can GTO fallback to using CLI Git?

dberenbaum avatar Sep 17 '23 19:09 dberenbaum

We could make GTO init the scmrepo instances with gitpython as the highest priority backend, but that's still not ideal since it means DVC will end up using gitpython when it calls into GTO.

pmrowla avatar Sep 18 '23 01:09 pmrowla

What about trying to prioritize gitpython in scmrepo if we detect we are in GitHub? Fixing in dulwich sounds best but the full sprint estimate makes it hard to prioritize.

dberenbaum avatar Sep 18 '23 18:09 dberenbaum