terragrunt
terragrunt copied to clipboard
Error in call to function "get_repo_root" - "git" executable not found in $PATH failure
Describe the bug
We are currently experiencing intermittent build failures in CircleCI when running validate commands against multiple modules in parallel (our current configuration is parallelism: 25). The issue only surfaces on some of the parallelised validate jobs, and we're utilising CircleCI's test splitting functionality for this.
The error we're currently getting is below:
time=2023-12-28T19:57:53Z level=error msg=[module path omitted]/terragrunt.hcl:7,15-29: Error in function call; Call to function "get_repo_root" failed: exec: "git": executable file not found in $PATH., and 1 other diagnostic(s)
time=2023-12-28T19:57:53Z level=error msg=Unable to determine underlying exit code, so Terragrunt will exit with error code 1
Makefile:28: recipe for target '<some target>' failed
We're running this in a Docker image, which definitely has git installed. A snippet of the Dockerfile is below:
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y --no-install-recommends \
automake \
autoconf \
bsdmainutils \
ca-certificates \
libreadline-dev \
libncurses-dev \
libssl-dev \
libyaml-dev \
libxslt-dev \
libffi-dev \
libtool \
tzdata \
unixodbc-dev \
unzip \
curl \
git && \
rm -rf /var/lib/apt/lists/*
ENV PATH="$PATH:/root/.asdf/shims:/root/.asdf/bin"
<other asdf related things>
To Reproduce
- Build a docker image with the above tooling and a base image of
ubuntu:22.04, additionally installterragruntandterraforminside the image, we copy in a large number of arbitrarily nested modules inside the container - Run the following through CircleCI
< ... snipped ... >
validate:
executor: default
parallelism: 25
steps:
- setup
- run:
name: Build image
command: <snipped>
- run:
name: Validate
command: |
find <basedir>/ -mindepth 2 -name "terragrunt.hcl" -printf "%h\n" | \
circleci tests split | \
xargs -I {} docker run --rm \
-v /tmp/.terraform.d/plugin-cache:/root/.terraform.d/plugin-cache \
-e DISABLE_BACKEND=true \
<img> terragrunt -w {} validate
< ... snipped ... >
If you need more of the CircleCI config I can probably redact most of it.
Expected behavior
I'd expect each of the parallelised jobs to be able to find the git executable, since it's only ever between 1-4 jobs that fail, the rest succeed and can find the binary fine.
Nice to have
- [ ] Terminal output
- [ ] Screenshots
Versions
- Terragrunt version: 0.54.1
- Terraform version: 1.6.3
- Environment details (Ubuntu 20.04, Windows 10, etc.): CircleCI
ubuntu-2204:2023.10.1machine image, withdocker_layer_cachingenabled
Additional context
This issue only presented itself when we bumped to >=0.50.0. It should be mentioned that at the same we also went from Terraform version 1.4.5 --> 1.6.3. It's hard to tell whether is issue is because of the plugin-cache volume mount (of which I have seen some issues), or this is an issue with Terragrunt itself.
Thanks in advance!
Hi, will be helpful to have an example repository where the issue happens, I tried to get same error in https://github.com/denis256/terragrunt-tests/tree/master/issue-2873 but without success
Hey @denis256,
The Dockerfile looks correct, our repository is private within our Organisation, but I can tell you that we have around 770-ish modules. Is there anything else I can provide that might make this easier to test? I will conduct some internal testing with another job execution platform (i.e. GHA) to see if I can also reproduce this there.
Hi, we need a way to get same error, through Dockerfile + repo or Circle CI job or Github action
Hey @denis256, since our repo is private within our Organisation it will be difficult for me to point you to a repo which can simulate the number of terragrunt modules we have.
Right before executing terragrunt -w <path to module> validate I added some logging to dump out the path to git, which definitely exists.
which git && echo "OK, found 'git'" || echo "NOK, could not find 'git'" && \
DISABLE_BACKEND=true bin/terragrunt -w <path to module> validate
/usr/bin/git
OK, found 'git'
time=2024-01-11T10:16:21Z level=error msg=<a base dir>/terragrunt.hcl:2,15-29: Error in function call; Call to function "get_repo_root" failed: exec: "git": executable file not found in $PATH., and 1 other diagnostic(s)
I was able to localize the issue. It was caused by the data race in config/dependency.go, which was casing data corruption, resulting in the PATH value TerragruntOptions.Env to be reset. When that value was propagated to the git subprocess call, it would cause the invocation failure. The issue was quite hard to reproduce as it was only showing up on CircleCI in a Docker container, and only in <1% of all Terragrunt runs. I will open a PR to fix it.