megalinter icon indicating copy to clipboard operation
megalinter copied to clipboard

git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)

Open bdovaz opened this issue 1 year ago • 13 comments

@nvuillam I am trying to run from an Azure Pipelines megalinter pipeline but it fails me with the following.

I also don't understand why it is doing diffs (which by the way takes minutes on it and makes it very slow) if I have VALIDATE_ALL_CODEBASE: true.

  File "/megalinter/utils.py", line 230, in list_updated_files
    changed_files = [item.a_path for item in repo.index.diff(None)]
  File "/usr/local/lib/python3.10/site-packages/git/index/base.py", line 1309, in diff
    return super(IndexFile, self).diff(other, paths, create_patch, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/git/diff.py", line 175, in diff
    index = diff_method(self.repo, proc)
  File "/usr/local/lib/python3.10/site-packages/git/diff.py", line 570, in _index_from_raw_format
    handle_process_output(proc, lambda byt: cls._handle_diff_line(byt, repo, index),
  File "/usr/local/lib/python3.10/site-packages/git/cmd.py", line 176, in handle_process_output
    return finalizer(process)
  File "/usr/local/lib/python3.10/site-packages/git/util.py", line 386, in finalize_process
    proc.wait(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/git/cmd.py", line 502, in wait
    raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git diff --abbrev=40 --full-index -M --raw -z --no-color
"""

The above exception was the direct cause of the following exception:

AutoInterrupt wait stderr: b''
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/megalinter/run.py", line 15, in <module>
    linter.run()
  File "/megalinter/MegaLinter.py", line 155, in run
    self.process_linters_parallel(active_linters, linters_do_fixes)
  File "/megalinter/MegaLinter.py", line 240, in process_linters_parallel
    updated_linters = pool_result.get()
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git diff --abbrev=40 --full-index -M --raw -z --no-color

bdovaz avatar Jul 18 '22 09:07 bdovaz

Please can you add LOG_LEVEL=DEBUG and share the full log ? If you also can provide your .mega-linter.yml and the Azure pipelines script, it would be great :)

nvuillam avatar Jul 18 '22 12:07 nvuillam

.mega-linter.yml

APPLY_FIXES: all
ENABLE_LINTERS:
  # Languages
  - BASH_EXEC
  - BASH_SHELLCHECK
  - BASH_SHFMT
  - POWERSHELL_POWERSHELL
  # Formats
  - ENV_DOTENV_LINTER
  - JSON_JSONLINT
  - JSON_ESLINT_PLUGIN_JSONC
  - JSON_V8R
  - JSON_PRETTIER
  - MARKDOWN_MARKDOWNLINT
  - MARKDOWN_MARKDOWN_LINK_CHECK
  - MARKDOWN_MARKDOWN_TABLE_FORMATTER
  - YAML_PRETTIER
  - YAML_YAMLLINT
  - YAML_V8R
  # Tooling formats
  - DOCKERFILE_HADOLINT
LOG_LEVEL: DEBUG
OUTPUT_DETAIL: detailed
OUTPUT_FORMAT: tap
PRINT_ALPACA: false
PRINT_ALL_FILES: true
SHOW_ELAPSED_TIME: true
VALIDATE_ALL_CODEBASE: true

azure-pipelines.yml (task used)

- task: PowerShell@2
  displayName: Code Scan using MegaLinter
  inputs:
    targetType: inline
    script: |
      docker run `
      --rm `
      -v $(System.DefaultWorkingDirectory):/tmp/lint `
      -v $(Build.ArtifactStagingDirectory)/MegaLinter/report:/tmp/azure-pipelines-megalinter-report `
      -e REPORT_OUTPUT_FOLDER=/tmp/azure-pipelines-megalinter-report `
      -e MEGALINTER_CONFIG=.mega-linter.yml `
      oxsecurity/megalinter-dotnet:v6
    pwsh: true
  continueOnError: true

Log (with debug level), I can only share the fragment where the exception appears and which does not refer to code (NDA I cannot share the log).:

[31m❌ Linted [POWERSHELL] files with [powershell]: Found 11 error(s) - (40.8s)[0m
[Tap Reporter] Generated TAP report: /tmp/azure-pipelines-megalinter-report/tap/mega-linter-POWERSHELL_POWERSHELL.tap
[Text Reporter] Generated TEXT report: /tmp/azure-pipelines-megalinter-report/linters_logs/ERROR-POWERSHELL_POWERSHELL.log
[prettier] command: ['prettier', '--write', '/tmp/lint/.azuredevops/dependabot.yml']
[prettier] CWD: /
[prettier] result: 0 tmp/lint/.azuredevops/dependabot.yml 182ms

Popen(['git', 'diff', '--abbrev=40', '--full-index', '-M', '--raw', '-z', '--no-color'], cwd=/tmp/lint, universal_newlines=False, shell=None, istream=None)
AutoInterrupt wait stderr: b''
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/megalinter/MegaLinter.py", line 35, in run_linters
    linter.run()
  File "/megalinter/Linter.py", line 554, in run
    self.update_files_lint_results(
  File "/megalinter/Linter.py", line 612, in update_files_lint_results
    updated_files = utils.list_updated_files(self.github_workspace)
  File "/megalinter/utils.py", line 230, in list_updated_files
    changed_files = [item.a_path for item in repo.index.diff(None)]
  File "/usr/local/lib/python3.10/site-packages/git/index/base.py", line 1309, in diff
    return super(IndexFile, self).diff(other, paths, create_patch, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/git/diff.py", line 175, in diff
    index = diff_method(self.repo, proc)
  File "/usr/local/lib/python3.10/site-packages/git/diff.py", line 570, in _index_from_raw_format
    handle_process_output(proc, lambda byt: cls._handle_diff_line(byt, repo, index),
  File "/usr/local/lib/python3.10/site-packages/git/cmd.py", line 176, in handle_process_output
    return finalizer(process)
  File "/usr/local/lib/python3.10/site-packages/git/util.py", line 386, in finalize_process
    proc.wait(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/git/cmd.py", line 502, in wait
    raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git diff --abbrev=40 --full-index -M --raw -z --no-color
"""

The above exception was the direct cause of the following exception:

AutoInterrupt wait stderr: b''
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/megalinter/run.py", line 15, in <module>
    linter.run()
  File "/megalinter/MegaLinter.py", line 155, in run
    self.process_linters_parallel(active_linters, linters_do_fixes)
  File "/megalinter/MegaLinter.py", line 240, in process_linters_parallel
    updated_linters = pool_result.get()
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git diff --abbrev=40 --full-index -M --raw -z --no-color
##[error]PowerShell exited with code '1'.

Overview:

+----MATCHING LINTERS-------------------+---------------------------------------------+----------------+------------+
| Descriptor | Linter                   | Criteria                                    | Matching files | Format/Fix |
+------------+--------------------------+---------------------------------------------+----------------+------------+
| BASH       | bash-exec                | .sh|.bash|.dash|.ksh                        | 21             | no         |
| BASH       | shellcheck               | .sh|.bash|.dash|.ksh                        | 21             | no         |
| BASH       | shfmt                    | .sh|.bash|.dash|.ksh                        | 21             | yes        |
| DOCKERFILE | hadolint                 | Dockerfile                                  | 6              | no         |
| ENV        | dotenv-linter            | .env                                        | 5              | yes        |
| JSON       | jsonlint                 | .json                                       | 176            | no         |
| JSON       | eslint-plugin-jsonc      | .json|.json5|.jsonc                         | 176            | yes        |
| JSON       | v8r                      | .json                                       | 176            | no         |
| JSON       | prettier                 | .json                                       | 176            | yes        |
| MARKDOWN   | markdownlint             | .md                                         | 13             | yes        |
| MARKDOWN   | markdown-link-check      | .md                                         | 13             | no         |
| MARKDOWN   | markdown-table-formatter | .md                                         | 13             | yes        |
| POWERSHELL | powershell               | .ps1|.psm1|.psd1|.ps1xml|.pssc|.psrc|.cdxml | 16             | no         |
| YAML       | prettier                 | .yml|.yaml                                  | 29             | yes        |
| YAML       | yamllint                 | .yml|.yaml                                  | 29             | no         |
| YAML       | v8r                      | .yml|.yaml                                  | 29             | no         |
+------------+--------------------------+---------------------------------------------+----------------+------------+

bdovaz avatar Jul 18 '22 19:07 bdovaz

@bdovaz, what happens when you run git diff --abbrev=40 --full-index -M --raw -z --no-color manually on the command line? Also, what is the output of git status?

Kurt-von-Laven avatar Jul 18 '22 23:07 Kurt-von-Laven

Git status

On branch housekeeping/fix-linter-errors
Your branch is up to date with 'origin/housekeeping/fix-linter-errors'.

git diff --abbrev=40 --full-index -M --raw -z --no-color

Nothing

bdovaz avatar Jul 20 '22 09:07 bdovaz

@nvuillam @Kurt-von-Laven What I don't understand is why it takes so long to execute those "git diff", what are they for? Can they be disabled?

bdovaz avatar Jul 21 '22 09:07 bdovaz

It is making note of the files that were modified because fixes were applied. As a workaround, you could temporarily remove APPLY_FIXES: all from your MegaLinter config to avoid running the crashing diff. Is it possible there is something else modifying files besides MegaLinter? Can you reproduce the issue locally with mega-linter-runner? What is the exit code (echo $?) when you run the diff command manually? We run with APPLY_FIXES: all and have not encountered this issue either locally or in GitHub Actions. I am not yet sure what the key difference is between our scenarios.

Kurt-von-Laven avatar Jul 21 '22 10:07 Kurt-von-Laven

Update files reporter also uses git diff to detect files that has been auto fixed

This is a standard git diff using gitpython, i don't see what could make such operation really long :/

nvuillam avatar Jul 21 '22 10:07 nvuillam

If you are running MegaLinter on macOS or Windows, maybe try running the following commands as suggested by this blog post to improve git diff performance if your repository has many files:

git config core.fsmonitor true
git config core.untrackedcache true

If you are running on Linux or already use Watchman, see the section titled "Using Watchman and the sample hook script."

Kurt-von-Laven avatar Jul 21 '22 19:07 Kurt-von-Laven

It is making note of the files that were modified because fixes were applied. As a workaround, you could temporarily remove APPLY_FIXES: all from your MegaLinter config to avoid running the crashing diff. Is it possible there is something else modifying files besides MegaLinter? Can you reproduce the issue locally with mega-linter-runner? What is the exit code (echo $?) when you run the diff command manually? We run with APPLY_FIXES: all and have not encountered this issue either locally or in GitHub Actions. I am not yet sure what the key difference is between our scenarios.

With:

APPLY_FIXES: none UPDATED_SOURCES_REPORTER: false

I still see popen git diff logs... So there must be something else that launches these git diffs.

Update files reporter also uses git diff to detect files that has been auto fixed

This is a standard git diff using gitpython, i don't see what could make such operation really long :/

It's a large monorepo, this is probably the explanation.

I can understand that those git diff commands take longer due to the fact that it is a repo of many files but I would like it:

  • That it doesn't crash, which is unacceptable due to the fact that it leaves me unable to use megalinter.
  • Try to improve the performance of that command if possible.

If you are running MegaLinter on macOS or Windows, maybe try running the following commands as suggested by this blog post to improve git diff performance if your repository has many files:

git config core.fsmonitor true
git config core.untrackedcache true

If you are running on Linux or already use Watchman, see the section titled "Using Watchman and the sample hook script."

I have applied it but I don't see much improvement either. It says that it is compatible with Windows and macOS and I have tested it on macOS.

And worst of all, the crashes continue...

To provide more information, I test on different agents and different OS: Linux and macOS.

Same result: crash.

In fact, it is more curious. If I run the code I have put from the pipeline task locally, ie:

docker run `
      --rm `
      -v $(System.DefaultWorkingDirectory):/tmp/lint `
      -v $(Build.ArtifactStagingDirectory)/MegaLinter/report:/tmp/azure-pipelines-megalinter-report `
      -e REPORT_OUTPUT_FOLDER=/tmp/azure-pipelines-megalinter-report `
      -e MEGALINTER_CONFIG=.mega-linter.yml `
      oxsecurity/megalinter-dotnet:v6

No git diff command has been executed...

Looks like something specific when running in the Azure pipeline? But I don't understand it has to be something done in megalinter that depending on whether we are in github, gitlab or azure pipelines behaves differently no?

bdovaz avatar Jul 22 '22 06:07 bdovaz

I still see popen git diff logs... So there must be something else that launches these git diffs.

Yes, there are other diffs performed besides the one that is crashing. The two others I am aware of are the updated sources reporter @nvuillam mentioned and the VALIDATE_ALL_CODEBASE: false case where we use git diff to determine which files have been modified relative to the default branch. When I suggested removing APPLY_FIXES: all, I was referring only to skipping the git diff that appeared in your stack trace. Are you encountering other stack traces as well as you experiment with different configs and runners?

I can understand that those git diff commands take longer due to the fact that it is a repo of many files but I would like it:

  • That it doesn't crash, which is unacceptable due to the fact that it leaves me unable to use megalinter.
  • Try to improve the performance of that command if possible.

I agree with you, but I don't know what is going on at this point. If the bottleneck is git diff, it may be difficult to improve monorepo performance on our end. It just hit me that the file system monitor wouldn't move the needle much; that makes me wonder if the runner is low on memory or disk space since it's supposed to be a big win. That being said, the file system monitor doesn't improve the performance of the first git diff after the file watcher daemon is enabled, so I suppose it's an optimization that will generally work better locally than in CI.

Is it possible to do a sparse checkout in your scenario? If not, a drastic measure one could take would be to run a number of parallel jobs each tasked with linting a different subdirectory of your monorepo using sparse checkouts. It might be worth checking just to see if the same stack trace crops up with a single sparse checkout since that would imply that git diff may be running out of computing resources. One significant difference between running locally and running in most CI environments is that the former typically has significantly more resources.

In fact, it is more curious. If I run the code I have put from the pipeline task locally, ie:

docker run `
      --rm `
      -v $(System.DefaultWorkingDirectory):/tmp/lint `
      -v $(Build.ArtifactStagingDirectory)/MegaLinter/report:/tmp/azure-pipelines-megalinter-report `
      -e REPORT_OUTPUT_FOLDER=/tmp/azure-pipelines-megalinter-report `
      -e MEGALINTER_CONFIG=.mega-linter.yml `
      oxsecurity/megalinter-dotnet:v6

You or others following along may prefer the convenience of mega-linter-runner to handcrafted Docker commands.

Looks like something specific when running in the Azure pipeline? But I don't understand it has to be something done in megalinter that depending on whether we are in github, gitlab or azure pipelines behaves differently no?

This is why I asked if it is possible that something else is modifying the code. I am not aware of any logic internal to MegaLinter that branches based on the CI environment. If you haven't already, I would check whether you can reproduce the issue in a job that only runs MegaLinter and has no other steps (beyond checking out the repository of course).

Kurt-von-Laven avatar Jul 22 '22 08:07 Kurt-von-Laven

with such config, i don't see how git_diff could be called :/

Except if a linter decides itself to internally call git diff :/

DISABLE_LINTERS:
- REPOSITORY_GIT_DIFF
APPLY_FIXES: none
VALIDATE_ALL_CODEBASE: true
UPDATED_SOURCES_REPORTER: false

nvuillam avatar Jul 22 '22 17:07 nvuillam

Do you mean VALIDATE_ALL_CODEBASE: true?

Kurt-von-Laven avatar Jul 22 '22 17:07 Kurt-von-Laven

Yes, good catch, thanks :)

nvuillam avatar Jul 22 '22 17:07 nvuillam

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

github-actions[bot] avatar Aug 22 '22 01:08 github-actions[bot]