checkov icon indicating copy to clipboard operation
checkov copied to clipboard

Checkov fails to download modules from Azure DevOps repos when run locally on Windows. 'fatal: could not create leading directories of ...'

Open patrcoff opened this issue 1 year ago • 2 comments

When running Checkov locally on Windows against terraform codebases which include modules referenced by git address to an Azure DevOps repository, the following errors are seen upon attempts to download the modules.

2024-02-14 11:49:45,458 [ThreadPoolEx] [WARNI]  failed to get git::[[email protected]](mailto:[email protected]):v3/<company>/IAC/tf-azurerm-storage-account?ref=v4.2.0 because of Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v --depth=1 -b v4.2.0 -- [[email protected]:v3/<company>/IAC/tf-azurerm-storage-account](mailto:[email protected]:v3/<company>/IAC/tf-azurerm-storage-account) C:\Users\username\repos\checkov_external_modules\ssh.dev.azure.com:v3\<company>\IAC\tf-azurerm-storage-account\v4.2.0
  stderr: 'fatal: could not create leading directories of 'C:\Users\username\repos\checkov_external_modules\ssh.dev.azure.com:v3\<company>\IAC\tf-azurerm-storage-account\v4.2.0': Invalid argument
'

The scanning of these modules is ultimately skipped due to these failures.

From my initial troubleshooting I have located the source of the issue being that the source address, which includes the colon character is being used as the temporary folder name. On Windows, a colon is not a valid character for filepaths.

This issue is likely similar, but not the same as seen in issue 5543

I have reviewed the source code and found that there is actually already logic to handle the special charactar limitations of windows, but this logic is being overwritten at a later stage in the execution flow for at least the GenericGitLoader class, possibly in others - I am yet to check. I have made a quick and dirty 'fix' locally to bypass the offending logic which overwrites the compatible folder names (which are pre-set, outside the loader class), but not yet tested other scenarios or run the tests - I intend to do this this week however. To be sure of my fix, I will need to:

  • [ ] verify no other path of execution is possible to gain entry into the GenericGitLoader._load_module method (git_loader.py) which does not first pass through ModuleLoaderRegistry.load (registry.py), where a windows compatible directory name is already generated.
  • [ ] verify no dependency on the current naming format exists in any other part of code (it shouldn't but want to be sure)
  • [ ] obviously need to do the rest of the standard stuff i.e. run tests etc but that's all covered in PR requirements of course

I believe the fix is going to simply be to remove any overwriting of the directory path name inside the loaders themselves, provided the above is checked off succesfully. This is effectively what I've done in my basic local 'fix' which worked for me but I've not tested any other scenarios yet.

Assuming this fix is valid however, I believe it would also solve the issue in 5543, by implementing the same fix in the gitlab loader. I feel it may be worthwhile going through all of the loaders to remove any overwriting of the directory name for completeness. Please let me know if you agree or would prefer this be kept isolated to my specific issue within the genric git loader. I'm happy to discuss in further detail if you are unsure.

patrcoff avatar Feb 18 '24 22:02 patrcoff

hey @patrcoff as I mentioned in the linked issue, it should be possible to remove special characters or shorten the path. The only challenge is to properly test it. Especially make sure, the code doesn't try to download the same module multiple times.

gruebel avatar Feb 19 '24 22:02 gruebel