Unable to fetch diff for a large PR
From the recent time I started to get fetch errors on PR with more than 300 files. At least couple months ago there were no problems, but now I see the following:
ERROR:CPP Linter:response returned 406 from GET https://api.github.com/repos/.../pulls/NNN with message:
{"message":"Sorry, the diff exceeded the maximum number of files (300).
Consider using 'List pull requests files' API or locally cloning the repository instead.","errors":
[{"resource":"PullRequest","field":"diff","code":"too_large"}],
"documentation_url":"https://docs.github.com/rest/pulls/pulls#list-pull-requests-files","status":"406"}
Is there a possibility to solve this issue on cpp-linter side?
We could try reverting back to the Github REST API endpoint that lists the files (with individual diffs) instead of trying to get the whole diff... IMHO, this isn't a good idea though. We changed to using REST API to get the full diff (in 1 HTTP request) to avoid
- having to re-assemble the entire diff from multiple diffs (which don't have all the diff info for each file).
- a file with a a lot of changes does not include diff at all when getting a list of files with individual diffs. This is the main reason we switched to getting the entire diff.
Getting the entire diff at once proved to be more reliable.
Honestly, a PR with 300 files changed is undesirable for many reasons, especially when working as part of a team. I'm willing to bet (& I'm not a gambling type of person) that parts of that large PR could have been broken up into smaller (& possibly sequential) PRs.
FYI, this problem isn't really specific to github. I had to stop working on gitlab support for the exact same reason: Gitlab's REST API does not offer a way to get the full diff and still neglects to return the diff for files with a lot of changes. However, github seems to have a bigger cap on the amount of data that the REST API can return.
or locally cloning the repository instead
This idea is less feasible because the default behavior of actions/checkout provides a shallow checkout. So, git diff on a shallow checkout returns all file changes since the repo was created. My experiments with increasing the fetch-depth option yielded no good results. And the fetch depth would be at the mercy of the repo's admin(s). Not to mention, the increase in CI time to checkout a repo with a non-zero fetch-depth, and the storage capacity of CI runners may be inadequate for large repos.
Honestly, a PR with 300 files changed is undesirable for many reasons, especially when working as part of a team. I'm willing to bet (& I'm not a gambling type of person) that parts of that large PR could have been broken up into smaller (& possibly sequential) PRs.
For most projects, I would be inclined to agree, the issue I have is that my project is a game project and so sometimes the art assets can easily throw it over the limit. I wish the API had some way to filter on the type of files, so that for example, only text files are considered in the diff.
Its true, the REST API endpoints don't provide an option to filter the files on a event's diff. Such a behavior might be more suited as a specific endpoint (like what gitea provides). Maybe this could be a feature request to GitHub.
Hi I also ran into this issue for a large PR refactoring various dependencies. basically removing a lot of files. I think it would be fine if the checks fails for such large diffs, But it would be nice if it failed in a nice way, not just a crash as is now.
If you're suggesting a silent failure (with zero exit code), then I'm sorry but no. If cpp-linter cannot get the diff of an event, then that is a critical failure. It doesn't only happen on a large diff, it could also mean the GitHub token used doesn't have adequate permissions.
FYI, disabling lines-changed-only, files-changed-only, and the PR review feature (tidy-review and format-review) should avoid needing the diff of the event.
I undersand, that makes sense. Not much one can do.
Github REST API has an endpoint that lists the files in a PR (with individual diffs) and an equivalent endpoint for commits.
[!warning] These endpoints take longer to consume/process and are limited to 3000 changed files.
I'm thinking about using these endpoints as a "plan B" for when this large diff problem is hit, but ultimately there is nothing we can do for ridiculously large PRs/commits. The plan B would still be subject to flaws in GitHub's REST API.
v2.13.0 will resort to using a paginated request/response for getting changed files' diff. This increases the max changed file count to 3000.
[!caution] There are still some other limitations imposed by the REST API that could easily surface with large diffs.
thread-commentsare truncated to 65535 bytes maximum, so some feedback from clang-tidy/format may not be reported.file-annotationsposted by 1 workflow are capped at 50 (or something similarly small)- PR review suggestions might also be limited to something like 25 comments (needs research/verification), but this feature is still experimental. The cpp-linter package codebase does not currently include compensation for the REST API limits about PR reviews.
In the end, it is still highly recommended to split up large changes into several incremental contributions.