atlantis
atlantis copied to clipboard
Atlantis apply all after a failed apply; outputs Ran Apply for 0 projects
I have a repo that uses the default workspace but there are a number of different project folders.
Atlantis version: 0.8.3 Terraform version: v0.12.8
version: 3
projects:
- name: qa
dir: qa_acct/qa_env
terraform_version: v0.12.8
autoplan:
when_modified: ["../../projects/*", "*.tf*", "../../modules/*"]
enabled: false
- name: staging
dir: prod_acct/staging_env
terraform_version: v0.12.8
autoplan:
when_modified: ["../../projects/*", "*.tf*", "../../modules/*"]
enabled: false
- name: prod
dir: prod_acct/prod_env
terraform_version: v0.12.8
autoplan:
when_modified: ["../../projects/*", "*.tf*", "../../modules/*"]
enabled: false
Plans are generated for all three projects as normal after commenting exactly atlantis plan
.
Immediately afterword, commenting atlantis apply
attempts to apply all three environments as expected. In this case, there was an apply error due to an AWS IAM policy being misconfigured and the plans were not successfully applied. A commit was pushed to fix this issue and another atlantis apply
was submitted. Note, there was not another atlantis plan
after the fix commit was pushed. Atlantis behaved as if it had forgotten about the failed plans and assumed they had been applied successfully when, in fact, they had not been. I believe the expected behavior should be to reject the apply since new commits were made and force another plan be run, correct?
The result was the following:
Ran Apply for 0 projects:
Automatically merging because all plans have been successfully applied.
Locks and plans deleted for the projects and workspaces modified in this pull request:
* dir: `prod_acct/prod_env` workspace: `default`
* dir: `prod_acct/staging_env` workspace: `default`
* dir: `qa_acct/qa_env` workspace: `default`
Yeah it's a bug. If autoplan had been enabled then there would have been new plans generated and the apply wouldn't have worked.
@lkysow Thanks for the confirmation. This bug is killing us right now. We want people to be able see non-locking plans being run (in our normal CI pipleine) before Approvals are submitted so they can actually validate their code before blocking other development. If we wanted to dig into solving this, where would be a good place to start looking? I took a really quick glance through the repo and nothing jumped out at me.
Thank you for building this tool by the way, I really appreciate the work that went into this.
After re-reading the ticket, this isn't technically a bug (although for your use-case it may as well be). Atlantis is just doing what you told it, it's up to you to run atlantis plan
if you've pushed a new commit and don't have autoplan running. Coupled with automerge is the real issue here. If you didn't have automerge you'd quickly realize that you didn't re-run plan and there wouldn't be an issue.
Also if you were running with the -d
or -p
flags you'd get an error that "the plan doesn't exist for that project, please run plan". When we added the apply-all command (i.e. atlantis apply
) we didn't replicate the behaviour. I'm not sure if it ever makes sense to not give an error in this case but I'd at least like to add a flag that lets you keep the old behaviour in case people were relying on it.
If we were to add some functionality to detect this case, it would be here: https://github.com/runatlantis/atlantis/blob/master/server/events/project_command_builder.go#L204 after Atlantis has found no pending plans. It could then exit with an error in this case.
I think a path forward may be:
- new flag
--allow-no-plan-apply
which defaults tofalse
now (breaking change) - thread that flag through and then check it at the line above
Well, I actually think your original interpretation made sense to me. To clarify, we would never want atlantis to apply without having the most up to date plan saved and locked.
What we’re doing instead is just running validate, fmt, lint andterraform plan --lock=false
in vanilla gitlab CI. Devs open an MR and need to fix any issues there, and get all approvals first, before the atlantis plan. The problem we were solving by doing it this way was autoplan opening locks too early in the process and thus blocking other MRs that were ready to be applied.
I still think this is a bug. Yes I wanted autoplan disabled but that just means I want the developers to run it if and only if all the pre-apply requirements are met. I would expect the apply step to run the same validation that the plan is locked and up to date and apply based on that. Turning off autoplan shouldn’t affect those checks. What seems to be happening with autoplan disabled is the apply is ignoring the plans and ultimately just applies nothing.
I can confirm there are plans and locks are created when they are supposed to be. It appears that the atlantis apply step is just ignoring those if a second apply is run after this first one fails. Expected behavior would be for the apply step to either force a replan if the MR was updated, or attempt to re-apply the original plan. It’s doing neither of these right now.
One thing I did notice was that if the apply does fail, the saved plans are deleted but the locks are left open (this may be the actual bug here). If we removed those after a failed apply, that would basically force the plan step. I don’t know if that’s the best solution but I think it would work.
it's up to you to run
atlantis plan
if you've pushed a new commit and don't have autoplan running.
Our team has autoplan on, but pushing a new commit doesn't cause Atlantis to redo the plan (because Bitbucket).
One thing I did notice was that if the apply does fail, the saved plans are deleted but the locks are left open (this may be the actual bug here). If we removed those after a failed apply, that would basically force the plan step. I don’t know if that’s the best solution but I think it would work.
I agree we should either have Atlantis not delete the plans, or error if an apply is attempted without any plans.
@lkysow - what's the reason for Atlantis to delete the plans after a failed apply? It could have failed because a transient provider issue, and re-running apply on the same plan would later succeed.
HI everyone I met this issue too. Any work in progress to fix this bug ? I removed the locks as mentioned above and redo "atlantis plan". It still shows "Ran Plan for 0 projects:"
Running plan on the same PR after a failed to apply should not be any different than if atlantis does not delete the plan, it is just an extra step.
But if someone else in another PR modify the environment you are running plan against you will have a problem no matter what but by re-running a plan you could actually find the drift.
I do not think this is a bug, it is a bit annoying to run plan again but since terraform is idempotent it should only apply the difference.
I can run atlantis plan
again and I am still getting the output "Ran Plan for 0 projects:"
If I run with atlantis plan -p *-production
it will apply.
with autoplan, you need to define every directory you want autoplan on/off in your atlantis.yaml otherwise it does not work, is what you guys are doing?
if this was a bug, no one will be using atlantis so I want to make sure if this is specific to multi-dir structure etc. For that, we need to see the altlantis.yaml files and dir structure so we can have a better idea.
This could be as simple as better documentation of autoplan with some examples.
@jamengual I am using an atlantis.yaml
that was previously working. I think around v0.19.* this started breaking. It is about 50 projects, each with its own project name so that the -p
wildcard flag can be used. The pattern for the naming is ${service_name}-${environment}
.
I discovered that if i run atlantis apply -p *-environment
, then the command will run, but it will run for ALL projects, regardless of what files have changed.
I have autoplan on, but if I run atlantis plan
manually, it doesn't seem to make a difference.
Also of note, I am using custom workflows, not sure if that makes a difference.
@evanstachowiak Please test with the pre-release image, we did some bug fixes there and I wonder if that could be the issue:
docker pull ghcr.io/runatlantis/atlantis:v0.19.3-pre.20220408
is this still an issue with v0.19.8
?
Hello @jamengual I was able to reproduce this issue on v0.19.8
, using the testdrive repository.
It only happened when using pre workflow hooks, such as the following:
---
repos:
- id: /.*/
pre_workflow_hooks:
- run: echo "hello world"
The server logs for the execution:
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"server/server.go:869","msg":"Atlantis started - listening on port 4141","json":{}}
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"scheduled/executor_service.go:46","msg":"Scheduled Executor Service started","json":{}}
{"level":"info","ts":"2022-09-22T13:59:09.305-0300","caller":"events/events_controller.go:533","msg":"parsed comment as command=\"apply\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{"gh-request-id":"X-Github-Delivery=dfb30ec0-3a97-11ed-9f80-6ecf217e25c6"}}
{"level":"info","ts":"2022-09-22T13:59:14.712-0300","caller":"events/working_dir.go:225","msg":"creating dir \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
{"level":"info","ts":"2022-09-22T13:59:15.360-0300","caller":"runtime/pre_workflow_hook_runner.go:50","msg":"successfully ran \"echo \\\"hello world\\\"\" in \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
yes it's still an issue @jamengual
I wonder if this is related to this : https://github.com/runatlantis/atlantis/pull/1633
Hello @jamengual I was able to reproduce this issue on
v0.19.8
, using the testdrive repository.It only happened when using pre workflow hooks, such as the following:
--- repos: - id: /.*/ pre_workflow_hooks: - run: echo "hello world"
The server logs for the execution:
{"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"server/server.go:869","msg":"Atlantis started - listening on port 4141","json":{}} {"level":"info","ts":"2022-09-22T13:58:58.502-0300","caller":"scheduled/executor_service.go:46","msg":"Scheduled Executor Service started","json":{}} {"level":"info","ts":"2022-09-22T13:59:09.305-0300","caller":"events/events_controller.go:533","msg":"parsed comment as command=\"apply\" verbose=false dir=\"\" workspace=\"\" project=\"\" flags=\"\"","json":{"gh-request-id":"X-Github-Delivery=dfb30ec0-3a97-11ed-9f80-6ecf217e25c6"}} {"level":"info","ts":"2022-09-22T13:59:14.712-0300","caller":"events/working_dir.go:225","msg":"creating dir \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}} {"level":"info","ts":"2022-09-22T13:59:15.360-0300","caller":"runtime/pre_workflow_hook_runner.go:50","msg":"successfully ran \"echo \\\"hello world\\\"\" in \"/home/gus/workspace/opensource/apply-for-0-projects-test/atlantis_linux_amd64/data/repos/GusAntoniassi/atlantis-example/1/default\"","json":{"repo":"GusAntoniassi/atlantis-example","pull":"1"}}
pre_workflow_hooks run before any atlantis.yaml file is parsed.
after that if no atlantis.yaml is defined it it will do nothing.
This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'
@jamengual Hello! Recently I reproduced that problem on v0.25.0
Also, I'm using pre-workflow hooks as described above. Is it possible to reopen this issue to fix this bug?
can you describe the steps you took to reproduce it?
Sure!
atlantis-0:/$ atlantis version atlantis v0.25.0 (commit: a12823e) (build date: 2023-08-11T20:51:19.440Z)
Repos config:
repos:
- id: "/.*/"
branch: "/.*/"
workflow: check
allow_custom_workflows: true
allowed_overrides: [workflow, delete_source_branch_on_merge]
apply_requirements: [approved]
pre_workflow_hooks:
- run: python3 code/atlantis_config_merge.py # script for generating atlantis.yaml
workflows:
check:
plan:
steps:
- run: echo "check passed"
terragrunt-tst:
plan:
steps:
- env:
...
- run: |
if [ ! -d "/tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM" ]; then
mkdir -p /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
fi
- run: terragrunt run-all plan -out ./plan.tfplan --terragrunt-non-interactive &> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
- run: terragrunt run-all show -json ./plan.tfplan --terragrunt-non-interactive 2> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/show_stderr.txt 1> ./plan.json || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/show_stderr.txt
- run: /tmp/infracost breakdown --path=. --format=json --log-level=info --out-file=./infracost.json --project-name=$REPO_REL_DIR 2>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt 1>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
- run: /tmp/infracost output --path=./infracost.json --format=json --out-file=./infracost-report.json 2>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt 1>> /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt || cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt
- run: |
/tmp/infracost comment gitlab --repo $BASE_REPO_OWNER/$BASE_REPO_NAME \
--merge-request $PULL_NUM \
--path ./infracost-report.json \
--gitlab-token $ATLANTIS_GITLAB_TOKEN \
--behavior new \
--show-all-projects
# script for output formatting. Not sure if it's relevant for this issue. Just to share
- run: python3 /opt/terragrunt_output_formatter.py --file /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/output.txt --output-file /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/fmt_output.txt && cat /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM/fmt_output.txt
- run: rm -rf /tmp/$BASE_REPO_OWNER-$BASE_REPO_NAME-$PULL_NUM
apply:
steps:
- env:
...
- run: terragrunt run-all apply ./plan.tfplan --terragrunt-non-interactive
atlantis.yaml example:
projects:
- autoplan:
when_modified:
- '**/*.hcl'
- '*.hcl'
dir: accounts/...
name: ...
workflow: terragrunt-tst
As a result, I have an MR message:
Ran Apply for 0 projects:
atlantis apply -p ...
solves the problem, but it's not comfortable to use it every time
This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'