go-github icon indicating copy to clipboard operation
go-github copied to clipboard

DownloadContents can't always download files in large directories

Open ran-arigur opened this issue 2 months ago • 4 comments

*github.RepositoriesService provides a special helper method called DownloadContents [code link] to get the contents of a file even if it's big enough that the result from GetContents method won't include them. DownloadContents works by first trying GetContents on the file itself, and then if that doesn't work (or doesn't seem to work, e.g. because the file is just genuinely empty), it falls back to calling GetContents on the containing directory, iterating over the results until it gets to the desired file, and then using the DownloadUrl for that desired file. (There's also another helper method called DownloadContentsWithMeta [code link] that does the same thing but returns a bit more information from the API calls; everything here applies to that method as well.)

The problem is, when you call GetContents on a directory, it only returns the first 1000 files. [doc link] So if the initial GetContents call (the one for the specific file) didn't work or didn't seem to work, and if the file isn't one of the first 1000 in the directory, then this approach will fail with the error "no file named %s found in %s".

The good news is, there's a simple fix: we don't actually need to call GetContents on the containing directory at all, because the initial call to GetContents on the individual file already gives the same information (including the DownloadUrl), even if the file is big enough that it doesn't contain the actual file contents. So we can just use that. In addition to fixing this issue, that would also fix https://github.com/google/go-github/issues/2480. (In fact, I see that this same solution has also been suggested in comments there.)

ran-arigur avatar Nov 07 '25 00:11 ran-arigur

Thank you, @ran-arigur - would you like to create a PR to fix this and #2480?

gmlewis avatar Nov 07 '25 01:11 gmlewis

Thank you, @ran-arigur - would you like to create a PR to fix this and #2480?

If @ran-arigur does not have the capacity to do it, I would like to step in and try to fix it.

sheeeng avatar Nov 07 '25 08:11 sheeeng

If @ran-arigur does not have the capacity to do it, I would like to step in and try to fix it.

Thank you, @sheeeng! It's yours.

gmlewis avatar Nov 07 '25 14:11 gmlewis

If @ran-arigur does not have the capacity to do it, I would like to step in and try to fix it.

I'm not sure whether I'd have capacity — if you're interested, please be my guest! 🙂

I have some implementation thoughts below, but please feel free to ignore them and do things a totally different way!


  • I think we should add a method (*github.RepositoryContent).DownloadContent(...), analogous to the existing (*github.RepositoryContent).GetContent(...) except that it also supports downloading content (which firstly enables large files, and secondly enables getting content for the RepositoryContent entries that are returned from a call to GetContents() on a directory). I think there are four cases this method would need to handle:
    • If the Type is something other than "file", return an error. (The reason for this explicit check is that the RepositoryContent might represent a directory returned from a call to GetContents() on its parent directory, and in that case it will look exactly like an empty file if we don't explicitly check the Type.)
    • Otherwise, if the Size is 0, return an io.ReadCloser over the empty string. (Or this can be folded into the following case.)
    • Otherwise, if the Content is present and non-empty, delegate to GetContent() and return an io.ReadCloser over the result.
    • Otherwise, use the DownloadURL. If it's missing, or we get an error or a non-2xx response, return an error.
  • (*github.RepositoriesService).DownloadContentsWithMeta(...) can delegate to that DownloadContent method, and (*github.RepositoriesService).DownloadContents(...) can delegate to DownloadContentsWithMeta. So these methods would be very simple, and mainly just there for backward-compatibility; the bulk of the complexity would have been moved to DownloadContent.

ran-arigur avatar Nov 07 '25 21:11 ran-arigur