go-github icon indicating copy to clipboard operation
go-github copied to clipboard

DownloadContents can't download files from root of repo

Open abatilo opened this issue 2 years ago • 19 comments

GitHub Apps have the ability to request access to individual files which is great for fine-grained access.

image

However, as of v47.1.0, this will prevent someone from being able to download a file from the root repository because of the way DownloadContents implements directory scanning first:

https://github.com/google/go-github/blob/8a4bdb5e400fef2e8d783d35d8f4cfd1bc79c01e/github/repos_contents.go#L129-L134

The first thing that the function does is get just the directory name with path.Dir but if you try to download a file like just CODEOWNERS which is a valid location for a CODEOWNERS file, then the first request to the API will create a URL like so:

https://api.github.com/repos/$owner/$repo/contents/

At the moment, there's no way to add the root directory to the list of single file paths. You cannot add a blank path and adding just / doesn't work either.

abatilo avatar Sep 25 '22 22:09 abatilo

So was this a regression caused by a change to v47.1.0 ?

Do you want to create a PR to fix this, @abatilo , or would you like me to open up this issue to other contributors to this repo to address?

gmlewis avatar Sep 25 '22 23:09 gmlewis

Hi @gmlewis, I think I actually poorly chose my words. I shouldn't have said "as of v47.1.0" but instead it's more that I found this behavior while using v47.1.0. It appears that this has been the behavior for the history of the function? I don't think I have the availability to work on this so I'd be happy to have this open up to anyone who wants to contribute!

abatilo avatar Oct 02 '22 04:10 abatilo

Thank you, @abatilo .

This would be a great PR for any new contributor to this repo or a new Go developer. All contributions are greatly appreciated!

Feel free to volunteer for any issue and the issue can be assigned to you so that others don't attempt to duplicate the work.

Please check out our CONTRIBUTING.md guide to get started. (In particular, please remember to go generate ./... and don't use force-push to your PRs.)

Thank you!

gmlewis avatar Oct 02 '22 11:10 gmlewis

I took a look into this tonight because I'd like to get involved in the project. Based on the path.Dir docs I found here, it seems like it should be possible for the existing code to download files from the root of a repo:

I was able to construct a minimal working example of this as follows:

  1. Created a test module called go-gh-testcase. main.go of this module looks like:
package main

import (
	"golang.org/x/oauth2"
	"github.com/google/go-github/github"
	"context"
	"fmt"
)

func main() {
	ctx := context.Background()
	ts := oauth2.StaticTokenSource(
		&oauth2.Token{AccessToken: "<YOUR-GITHUB-PAT-HERE>"})
	tc := oauth2.NewClient(ctx, ts)
	client := github.NewClient(tc)

	const (
		owner = "google"
		repo = "go-github"
	)
	filepath := "LICENSE"

	rc, _, err := client.Repositories.DownloadContents(ctx, owner, repo, filepath, nil)
	if err != nil {
		fmt.Errorf("Error was: %s\n", err)
	}
	defer rc.Close()
        byteCount := 100
	buf := make([]byte, byteCount)
        n, err := rc.Read(buf)
        if err != nil {
            fmt.Errorf("Error while reading into buffer: %s\n", err)
        }
	fmt.Printf("Contents of first %d bytes are: %s\n", byteCount, buf[:n])
}
  1. Modified go-github/github/repos_content.go very slightly to add some fmt.Printlns in the appropriate places:
diff --git a/github/repos_contents.go b/github/repos_contents.go
index be58fd5..d4a2319 100644
--- a/github/repos_contents.go
+++ b/github/repos_contents.go
@@ -127,13 +127,16 @@ func (s *RepositoriesService) GetReadme(ctx context.Context, owner, repo string,
 // code to verify the content is from a successful response.
 func (s *RepositoriesService) DownloadContents(ctx context.Context, owner, repo, filepath string, opts *RepositoryContentGetOptions) (io.ReadCloser, *Response, error) {
        dir := path.Dir(filepath)
+       fmt.Printf("Dir is: %s\n", dir)
        filename := path.Base(filepath)
+       fmt.Printf("Filename is: %s\n", filename)
        _, dirContents, resp, err := s.GetContents(ctx, owner, repo, dir, opts)
        if err != nil {
                return nil, resp, err
        }
 
        for _, contents := range dirContents {
+               fmt.Printf("\tName of found content item is: %s\n", *contents.Name)
                if *contents.Name == filename {
                        if contents.DownloadURL == nil || *contents.DownloadURL == "" {
                                return nil, resp, fmt.Errorf("no download link found for %s", filepath)

Because of the behavior of path.Dir, as outlined here, a filepath without any /s (such as the path "LICENSE" I chose in the first code block) will result in dir being set to ".". So I think this would actually result in a URL that looks like: https://api.github.com/repos/$owner/$repo/contents/.

Running the code above, I got the result:

Dir is: .
Filename is: LICENSE
        Name of found content item is: .codecov.yml
        Name of found content item is: .github
        Name of found content item is: .gitignore
        Name of found content item is: .golangci.yml
        Name of found content item is: AUTHORS
        Name of found content item is: CONTRIBUTING.md
        Name of found content item is: LICENSE
Contents of first 100 bytes are: Copyright (c) 2013 The go-github AUTHORS. All rights reserved.

Redistribution and use in source and

which is the first 100 bytes of go-github's LICENSE file.

Doing the same test on the go.sum file (last file in the root of the repo as it appears in GitHub UI) yields:

Dir is: .
Filename is: go.sum
        Name of found content item is: .codecov.yml
        Name of found content item is: .github
        Name of found content item is: .gitignore
        Name of found content item is: .golangci.yml
        Name of found content item is: AUTHORS
        Name of found content item is: CONTRIBUTING.md
        Name of found content item is: LICENSE
        Name of found content item is: README.md
        Name of found content item is: example
        Name of found content item is: github
        Name of found content item is: go.mod
        Name of found content item is: go.sum
Contents of first 100 bytes are: github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/

So, it seems like specifying a filename in the repo root will still behave properly; the API call will just consider dir to be ".", but the file will be found in the for loop over the contents of "." if it's in the root.


I've never participated in this project before, and I tried to make sure I understood the context before commenting, but if I've totally misunderstood something here, please let me know! I'm just trying to help and learn a little Golang! 😄

zaataylor avatar Oct 07 '22 03:10 zaataylor

I've never participated in this project before, and I tried to make sure I understood the context before commenting, but if I've totally misunderstood something here, please let me know! I'm just trying to help and learn a little Golang!

Thank you, @zaataylor , for the detailed analysis and excellent write-up... It is greatly appreciated! I believe that you have demonstrated that this is working as intended.

I will leave this issue open for a short while in case there are any other comments, but otherwise I believe this issue can be closed.

gmlewis avatar Oct 07 '22 11:10 gmlewis

Hi @zaataylor, your solution demonstrates accessing things at the root of the repo for a GitHub Personal Access Token which does not have the same fine grained per file permissions that a GitHub App might request.

Maybe this issue needs to be re-worded to be specific about the file path permissions that you can set on a GitHub App.

abatilo avatar Oct 09 '22 00:10 abatilo

@abatilo - is it not possible to give the GitHub App the appropriate permissions needed to access the root of the repo?

If that is truly the case, then I think that you need to contact GitHub tech support and ask how to accomplish this in a GitHub App. Feel free to report back here with what you find out.

gmlewis avatar Oct 09 '22 00:10 gmlewis

It's possible. I discovered this because I am building an app to do things based on CODEOWNERS, which can be in 3 locations. CODEOWNERS, docs/CODEOWNERS, and .github/CODEOWNERS.

There's no actual need for the app to request permissions or have any ability to read files in the root of the repo and my guess is that security minded organizations would prefer that.

I do still think that the "correct" solution here is to change the DownloadContents code from it's original implementation from 8 years ago to not need to do a GetContents call on the path.Dir of the path before attempting to do a for loop over each file in the folder to get a string match.

I may have time this week to open a PR with that change and see if the proposed solution is alright!

abatilo avatar Oct 10 '22 15:10 abatilo

Hello, may I try to take a look on this?

petersondmg avatar Oct 24 '22 03:10 petersondmg

Hello, may I try to take a look on this?

Sorry, @petersondmg , I should have assigned this to @abatilo from the previous comment. Let's wait to hear back what @abatilo discovers.

gmlewis avatar Oct 24 '22 10:10 gmlewis

I did NOT have the time when I had hoped to. Please feel free to assign to @petersondmg! I appreciate it though @gmlewis

abatilo avatar Oct 24 '22 17:10 abatilo

I did NOT have the time when I had hoped to. Please feel free to assign to @petersondmg! I appreciate it though @gmlewis

No problem, @abatilo - thank you for the note, and best wishes to you.

@petersondmg - it is yours.

gmlewis avatar Oct 24 '22 17:10 gmlewis

Hello Maybe I misunderstood the point of the issue, but I was able to 'download files from root of repo with DownloadContents', using a GitHub App token, and a personal token as well.

To generate a Github App Token I did:

1 - generated a private key for the app 2 - authenticated as a GitHub App (generated the jwt) 3 - retrieved an access token by requesting the endpoint using the jwt https://api.github.com/app/installations/<app_installation_id>/access_tokens 4 - used the access token to execute the script posted on comments above

It was able to list files on the root and download the file.

Should I try a different approach? @gmlewis @abatilo

Thanks.

petersondmg avatar Oct 29 '22 04:10 petersondmg

Thank you, @petersondmg ! Let's wait to hear from @abatilo before proceeding.

gmlewis avatar Oct 29 '22 09:10 gmlewis

Hi @petersondmg. If you configure the GitHub App with the Contents permissions, then there aren't any problems with downloading from the root. However, if you configure single files only like I have in the screenshot in the main post, and you provide a single file in the root as the only permission, then you won't be able to download it because you've only granted permission to the single path and not the directory.

abatilo avatar Oct 29 '22 16:10 abatilo

I believe this issue can be closed. Do you agree, @abatilo and @petersondmg ?

gmlewis avatar Nov 07 '22 14:11 gmlewis

@gmlewis I'm almost done with a solution to the point brought by @abatilo, that is, downloading the file without having to list the base directory. Just have to adjust the tests. I'll try to send a PR till the end of the week if you think it's still valid.

petersondmg avatar Nov 07 '22 15:11 petersondmg

Any news on the proposed solution? How can we help?

githoober avatar Apr 24 '23 01:04 githoober

Hi there, was there any progress made on this?

JayDubyaEey avatar Dec 13 '23 22:12 JayDubyaEey