checkout-files
checkout-files copied to clipboard
Consider using the Git Data API to support larger files
During a recent workflow run, I attempted to use the checkout-files
Action to checkout a fairly large package-lock.json
file (npm lockfile) weighing in at about 2 MB. I received this unhandled promise rejection:
Run Bhacaz/checkout-files@c8f01756bfd894ba746d5bf48205e19000b0742b
(node:1939) UnhandledPromiseRejectionWarning: HttpError: This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.: {"resource":"Blob","field":"data","code":"too_large"}
at /home/runner/work/_actions/Bhacaz/checkout-files/c8f01756bfd894ba746d5bf48205e19000b0742b/node_modules/@octokit/request/dist-node/index.js:66:23
at processTicksAndRejections (internal/process/task_queues.js:93:5)
You should consider using the Git Data API to support downloading larger files. 📦
How do we get the value of file_sha
for https://docs.github.com/en/rest/reference/git#get-a-blob?
The current action uses getContent(), which accepts the file path.
I had to go a little digging to explore that question! I've identified at least 2 viable ways.
Given that both of these approaches result in extra API calls, it might also be worthwhile to keep the current approach as the primary one, and only utilize this secondary approach if the request fails with a 403
status code. That would mean a bit more code to maintain but is probably the most optimal approach for most use cases. 🤷🏻
Combining the Repository Contents and Git Data APIs
- Reduce the list of requested file paths into a unique list of file path parent directories
- Get the repo contents for each parent directory path (instead of for each file path)
- In each response, find those files with matching
path
values and grab theirsha
property (or just the full GitHub Data Blob API URL from the_links.git
property if you don't want to dynamically build the URL) - Get a blob for each entry, still converting the responses from base64 as current
Using only the Git Data API
- Get a tree for the current branch/sha
- Would require PR #6, or analysis of the GitHub Actions event data to get the branch/sha/repository
default_branch
(or PR base branch, perhaps?) - If any of the requested file paths are not in the root directory, then you must add the
?recursive=true
query param, or else make multiple queries to get individual trees based on the first response (especially if the response hastruncated: true
)
- For the entries in the response's
tree
array, find those with matchingpath
values and grab theirsha
property (or just the full GitHub Data Blob API URL from theurl
property if you don't want to dynamically build the URL) - Get a blob for each entry, still converting the responses from base64 as current
Here's an updated and easier option! 🎉
https://github.blog/changelog/2022-05-03-increased-file-size-limit-when-retrieving-file-contents-via-rest-api/
TL;DR: Keep doing things as you are today, but just set this custom media type on the file retrieval request headers:
Accept: application/vnd.github.v3.raw
Created PR: https://github.com/Bhacaz/checkout-files/pull/9
@sun + @Bhacaz, I have opened a PR to utilize the raw endpoint that @JamesMGreene mentioned: https://github.com/Bhacaz/checkout-files/pull/23
Any interest in merging this into the action? If not, I will promote this to my own public action.
Thanks!