atlantis icon indicating copy to clipboard operation
atlantis copied to clipboard

Atlantis 0.17.2 not always creating 'default' working dir

Open srlightbody opened this issue 4 years ago • 8 comments

We've noticed some odd behavior after upgrading from 0.16.1 to 0.17.2. The behavior is: User creates a PR in GitHub Atlantis creates the repo folder and PR number folder in /home/atlantis/.atlantis/repos Atlantis does not create the default directory nor clone into it Atlantis attempts to check if the default workspace exists and fails with an error. Here's some debug level log output showing the issue - 021-07-21 13:24:48.491 MDT{caller: events/events_controller.go:417, json: {…}, level: info, msg: parsed comment as command="plan" verbose=false dir="" workspace="company-daily" project="" flags="", ts: 2021-07-21T19:24:48.490Z} 2021-07-21 13:24:48.491 MDT{caller: events/events_controller.go:439, json: {…}, level: debug, msg: executing command, ts: 2021-07-21T19:24:48.490Z} 2021-07-21 13:24:48.491 MDT{caller: server/middleware.go:37, json: {…}, level: debug, msg: POST /events – respond HTTP 200, ts: 2021-07-21T19:24:48.490Z} 2021-07-21 13:24:48.818 MDT{caller: server/server.go:749, json: {…}, level: info, msg: Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }, ts: 2021-07-21T19:24:48.818Z} 2021-07-21 13:24:48.885 MDT{caller: server/server.go:749, json: {…}, level: info, msg: Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }, ts: 2021-07-21T19:24:48.883Z} 2021-07-21 13:24:49.245 MDT{caller: events/project_command_builder.go:287, json: {…}, level: debug, msg: building plan command, ts: 2021-07-21T19:24:49.244Z} 2021-07-21 13:24:49.245 MDT{caller: events/project_command_builder.go:294, json: {…}, level: debug, msg: cloning repository, ts: 2021-07-21T19:24:49.244Z} 2021-07-21 13:24:49.245 MDT{caller: events/working_dir.go:202, json: {…}, level: info, msg: creating dir "/home/atlantis/.atlantis/repos/company/atlantis-foo/218/company-daily", ts: 2021-07-21T19:24:49.244Z} 2021-07-21 13:24:49.884 MDT{caller: events/working_dir.go:268, json: {…}, level: debug, msg: ran: git clone --branch 5625048_daily_staging --depth=1 --single-branch https://companyatlantis:<redacted>@github.com/company/atlantis-foo.git /home/atlantis/.atlantis/repos/company/atlantis-foo/218/company-daily. Output: Cloning into '/h… 2021-07-21 13:24:49.886 MDT{caller: server/server.go:749, json: {…}, level: info, msg: Apply Lock: {false 0001-01-01 00:00:00 +0000 UTC }, ts: 2021-07-21T19:24:49.886Z} 2021-07-21 13:24:50.226 MDT{caller: events/pull_updater.go:14, json: {…}, level: error, msg: checking if workspace exists: stat /home/atlantis/.atlantis/repos/company/atlantis-foo/218/default: no such file or directory, stacktrace: github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull /home/circleci/proje…

The full log for that last line is - { "caller": "events/pull_updater.go:14", "json": { ... }, "msg": "checking if workspace exists: stat /home/atlantis/.atlantis/repos/companymaps/atlantis-foo/218/default: no such file or directory", "stacktrace": "github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull /home/circleci/project/server/events/pull_updater.go:14 github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run /home/circleci/project/server/events/plan_command_runner.go:162 github.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run /home/circleci/project/server/events/plan_command_runner.go:223 github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand /home/circleci/project/server/events/command_runner.go:212", "ts": "2021-07-21T19:24:50.225Z", "level": "error" }

The issue is intermittent, i.e. I can close out a PR that has had the issue, open a new one with the same commits in it, and the new one will work just fine. Rolling Atlantis back to 0.16.1 completely resolves the issue.

I've spent some time today digging around, I think it may be related to the change introduced in #1620 in some way, it seems like atlantis is attempting to use the default directory without it ever being initialized. We do use a custom workflow for our planning step that adds a simplified output comment for users, and an atlantis.yaml with file specific auto plan triggers, but if it's an interaction with those I have not figured out the issue yet.

srlightbody avatar Jul 21 '21 23:07 srlightbody

@srlightbody Are you triggering atlantis plan through GitHub comment? I'm thinking that this might be caused when autoplan is not triggered due to no changes in the code and no pre_workflow_hook is present. If either of autoplan or pre_workflow_hook is present they will create the default folder. If neither exists and you trigger the plan with PR comment(i.e. atlantis plan -w <workspace-name>) this error will happen. Is that's the case?

msarvar avatar Jul 23 '21 02:07 msarvar

I've done some more digging and I think there were 2 distinct issues going on that made this extra confusing. A bunch of our webhooks were failing with a 301 after the upgrade, the url we were using as the hook target ended in a ., ie https://atlantis.endpoint./events. For some reason that started causing a 301. I've since rolled a change that fixes the hooks, and am going to retry the upgrade to 0.17.2 today so I can do more thorough testing.

That being said, when the issue was occurring it was with autoplans being prompted by an atlantis.yaml in the repo. We trigger autoplans based on changed file, and select a workspace as part of that. The default workspace is unused.

srlightbody avatar Jul 23 '21 16:07 srlightbody

@srlightbody This is definitely a bug and needs to be fixed. I think one potential workaround could be adding a no-op pre-workflow-hook. Can you try adding following to the config:

pre_workflow_hooks:
   - echo "do nothing"

Let me know if that mitigates the issue for the time being.

msarvar avatar Jul 27 '21 20:07 msarvar

I get this same error under the same conditions. We are updating from 0.16.1 -> 0.17.2. Autoplans are disabled and we currently do not have a pre_workflow_hook. As a workaround, I was able to get past this by adding the following to my repos.yaml:

  pre_workflow_hooks:
    - run: echo "workaround"

askmike1 avatar Aug 12 '21 17:08 askmike1

Hi there!

I'm facing a similar situation. Upgrading from version v.0.16.1 to anything above and including v0.17.2 makes all my plans fail with that same error:

"checking if workspace exists: stat /home/atlantis/.atlantis/repos/${repo-name}/terraform/${pull-request-id}/default: no such file or directory"

Important context:

  • Running on bitbucket cloud
  • Plan/apply triggered via comments only
  • Using workspaces

Example command in a comment:

atlantis plan -d path/to/changes -w foo

Atlantis logs for above command:

{"level":"info","ts":"2022-08-17T12:21:15.342Z","caller":"events/events_controller.go:417","msg":"parsed comment as command=\"plan\" verbose=false dir=\"path/to/changes\" workspace=\"foo\" project=\"\" flags=\"\"","json":{}}
{"level":"info","ts":"2022-08-17T12:21:15.825Z","caller":"events/working_dir.go:202","msg":"creating dir \"/home/atlantis/.atlantis/repos/my-org/my-repo/3843/foo\"","json":{"repo":"my-org/my-repo","pull":"3843"}}
{"level":"error","ts":"2022-08-17T12:21:18.698Z","caller":"events/pull_updater.go:14","msg":"checking if workspace exists: stat /home/atlantis/.atlantis/repos/my-org/my-repo/3843/default: no such file or directory","json":{"repo":"my-org/my-repo","pull":"3843"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*PullUpdater).updatePull\n\t/home/runner/work/atlantis/atlantis/server/events/pull_updater.go:14\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:162\ngithub.com/runatlantis/atlantis/server/events.(*PlanCommandRunner).Run\n\t/home/runner/work/atlantis/atlantis/server/events/plan_command_runner.go:223\ngithub.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\t/home/runner/work/atlantis/atlantis/server/events/command_runner.go:212"}

Additional comments

  • Notice how Atlantis first creates a directory using the workspace name ("msg":"creating dir \"/home/atlantis/.atlantis/repos/my-org/my-repo/3843/foo\") and then looks for a directory named default ("msg":"checking if workspace exists: stat /home/atlantis/.atlantis/repos/my-org/my-repo/3843/default: no such file or directory").
  • Running version v0.17.1 works fine - will stick to it until the issue is fixed.

emulanob avatar Aug 17 '22 12:08 emulanob

is this still an issue with v0.19.8?

jamengual avatar Aug 26 '22 03:08 jamengual

Hi @jamengual. Yes, I started upgrading from v.0.16.1 to v0.19.8, which failed, and then downgraded until I reached one that worked.

emulanob avatar Aug 26 '22 10:08 emulanob

I'm working with latest version which is currently v0.19.9-pre.2022082 and the same error is happening. Git: bitbucket cloud Using workspaces

pre_workflow_hooks workaround commented before make it work.

j0rzsh avatar Sep 01 '22 11:09 j0rzsh

I've been using Atlantis for a year and recently encountered an error with version 0.18.2.0. Screenshot 2022-11-21 at 11 27 43 AM Screenshot 2022-11-21 at 11 35 15 AM Screenshot 2022-11-21 at 11 39 35 AM

sujeets-toast avatar Nov 21 '22 06:11 sujeets-toast

It's possible the atlantis pod ran out of space?

Please also try with the latest version 0.20.1.

nitrocode avatar Nov 21 '22 06:11 nitrocode

It's possible the atlantis pod ran out of space?

Please also try with the latest version 0.20.1.

Thanks for your reply. I created a new repository with the same name as the one it is currently using. It's working for me. Due to time constraints, I will upgrade the Atlantis image later because it will necessitate a significant amount of testing for us. 

sujeets-toast avatar Nov 21 '22 11:11 sujeets-toast

This happened to me when trying to run atlantis plan via comment on an empty PR. I pushed a commit with a trivial change and the atlantis plan via comment worked.

tekumara avatar Mar 05 '23 23:03 tekumara

My org has run into this same issue a number of times. It seems to randomly resolve it self at some point. We also use custom workspaces. This is still happening with the latest version of Atlantis (v0.23.3).

hskrtich avatar Mar 27 '23 19:03 hskrtich

Same issue here on latest version, occurs on all new PRs until one runs atlantis plan. e.g. cannot run atlantis plan -p project_name without running atlantis plan first

bml1g12 avatar May 23 '23 08:05 bml1g12

This is also happening for us, we were using 0.19.9 and recently upgraded to 0.24.2.

inkel avatar Jun 02 '23 20:06 inkel

Any updates for this issue? We are having the same problem here in the organization, when updating from v0.17.0 to v0.24.3

Jonathanboliveira avatar Jul 11 '23 21:07 Jonathanboliveira

Hello! Any updates for this issue? We are having the same problem here in the organization, on v0.25.0

kelvingl avatar Aug 14 '23 18:08 kelvingl

we are documenting the Locks flow, which includes part of the cloning process too, after that we will try to figure a way to make this more stable https://github.com/runatlantis/atlantis/pull/3345

jamengual avatar Aug 14 '23 19:08 jamengual

Hi! In my organisation we are also facing this, and we are using the 0.25.0 version. We have some reproducible cases:

  • You have an empty PR and you run a manual plan + atlantis unlock command and we run atlantis plan again and the system will output checking if workspace exists: stat /home/atlantis/.atlantis/repos. The only way to fix this is sending a commit to the PR and trying to replan We have other cases, but it has been difficult to reproduce.

carmennavarreteh avatar Jan 04 '24 14:01 carmennavarreteh