atlantis icon indicating copy to clipboard operation
atlantis copied to clipboard

With defined gh-team-allowlist Atlantis randomly stops working with 401 Unauthorized body

Open komljen opened this issue 2 years ago • 15 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

With defined gh-team-allowlist, Atlantis randomly stops working with the following error when running the plan:

{"level":"error","ts":"2022-04-05T15:33:04.300Z","caller":"events/command_runner.go:219","msg":"Unable to check user permissions: non-200 OK status code: 401 Unauthorized body: \"{\\\"message\\\":\\\"Bad credentials\\\",\\\"documentation_url\\\":\\\"https://docs.github.com/graphql\\\"}\"","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:219"}

A restart of the pod fixes it, but it breaks again after a few hours.

Atlantis version: v0.19.2 Config:

disable-apply-all: true
enable-diff-markdown-format: true
enable-regexp-cmd: true
gh-app-id: <ID>
gh-app-key-file: /atlantis/gh-app-key-file.pem
gh-app-slug: atlantis-faire
gh-org: Faire
gh-team-allowlist: "*:plan,*:unlock,backend-platform:*,data-infra:*"
gh-webhook-secret: <SECRET>
hide-prev-plan-comments: true
write-git-creds: true

I also tried v0.19.1, but it failed with the following error:

"Unable to check user permissions: struct field for \"__schema\" doesn't exist in any of 1 places to unmarshal

However, this is expected, from release notes in the latest version.

komljen avatar Apr 06 '22 10:04 komljen

the struct issue you are reporting was fix in https://github.com/runatlantis/atlantis/pull/2128

jamengual avatar May 12 '22 18:05 jamengual

I'm not reporting that issue in this one. This is non-200 OK status code: 401 Unauthorized body with v0.19.2. I just mentioned that I tried v0.19.1 as well and got the issue that is already fixed, but that is ok and expected.

komljen avatar May 15 '22 11:05 komljen

understood

jamengual avatar May 15 '22 17:05 jamengual

@komljen I found this article. Might be related to rate limit and the misleading error message. Could you check your rate limit when it happens again?

raymondchen625 avatar May 19 '22 14:05 raymondchen625

Interesting, will check that and report on the findings.

komljen avatar May 21 '22 08:05 komljen

This is an interesting finding https://github.com/runatlantis/atlantis/issues/2285#issuecomment-1152365866 So, it works with token auth but doesn't with GH App.

komljen avatar Jun 13 '22 09:06 komljen

@komljen Yeah, we've now been able to run for multiple days with 0.19.3 and the user+token authentication instead of GH App. With the GH App authentication, we could only go a few hours at most.

With 0.17.5, the GH App route worked perfectly fine.

cjbehm avatar Jun 14 '22 13:06 cjbehm

interesting:

if you switch right now to GH app and 0.17.5 with gh-team-allowlist does it work for you?

I'm trying to understand why this could be.

On Tue., Jun. 14, 2022, 6:12 a.m. cjbehm, @.***> wrote:

@komljen https://github.com/komljen Yeah, we've now been able to run for multiple days with 0.19.3 and the user+token authentication instead of GH App. With the GH App authentication, we could only go a few hours at most.

With 0.17.5, the GH App route worked perfectly fine.

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2187#issuecomment-1155168010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERHGFHB6UKV3NFEPI4LVPCALRANCNFSM5SVRFIRA . You are receiving this because you commented.Message ID: @.***>

jamengual avatar Jun 14 '22 15:06 jamengual

@jamengual I'm not using the GH team allow list feature, just was confirming @komljen 's comment, so I can't test that out (also the gh team list feature was added in 0.18 and moved to GraphQL in 0.18.3)

I do think that #2285 and this issue could be the same root cause, but I created that issue specifically because our errors arose without using any new features; just as a pure version upgrade.

cjbehm avatar Jun 14 '22 16:06 cjbehm

I'm starting to believe the API call throttling issue is what is causing this and the error message does not help much.

I'm hoping Github API will be more descriptive of the real issue behind it and hopefully expose metrics around API calls.

On Tue, Jun 14, 2022 at 9:02 AM cjbehm @.***> wrote:

@jamengual https://github.com/jamengual I'm not using the GH team allow list feature, just was confirming @komljen https://github.com/komljen 's comment, so I can't test that out (also the gh team list feature was added in 0.18 and moved to GraphQL in 0.18.3)

I do think that #2285 https://github.com/runatlantis/atlantis/issues/2285 and this issue could be the same root cause, but I created that issue specifically because our errors arose without using any new features; just as a pure version upgrade.

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2187#issuecomment-1155399198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERE5O4YMWTKOTWZYJGTVPCULFANCNFSM5SVRFIRA . You are receiving this because you were mentioned.Message ID: @.***>

jamengual avatar Jun 14 '22 17:06 jamengual

Could Atlantis request and log rate limit info in its query per https://docs.github.com/en/graphql/overview/resource-limitations ?

It's hard to imagine throttling as the source when our problem in #2285 disappeared by switching to token auth instead of GH App, but GitHub's API response on its own is nearly useless.

I'm starting to believe the API call throttling issue is what is causing this and the error message does not help much. I'm hoping Github API will be more descriptive of the real issue behind it and hopefully expose metrics around API calls.

cjbehm avatar Jun 14 '22 18:06 cjbehm

On Tue, Jun 14, 2022 at 11:36 AM cjbehm @.***> wrote:

Could Atlantis request and log rate limit info in its query per https://docs.github.com/en/graphql/overview/resource-limitations ?

PRs are welcome

It's hard to imagine throttling as the source when our problem in #2285 https://github.com/runatlantis/atlantis/issues/2285 disappeared by switching to token auth instead of GH App, but GitHub's API response on its own is nearly useless.

exactly, how do we know if the response is so cryptic?

I'm starting to believe the API call throttling issue is what is causing this and the error message does not help much. I'm hoping Github API will be more descriptive of the real issue behind it and hopefully expose metrics around API calls. … <#m_-2437748453883505730_>

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2187#issuecomment-1155586481, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERBVVNXDJGDRAQ7RJGTVPDGJDANCNFSM5SVRFIRA . You are receiving this because you were mentioned.Message ID: @.***>

jamengual avatar Jun 14 '22 18:06 jamengual

is this still happening in v0.19.8?

jamengual avatar Aug 26 '22 04:08 jamengual

is this still happening in v0.19.8?

I didn't try that version yet but will wait for this PR https://github.com/runatlantis/atlantis/pull/2479. Seems like a proper fix for this issue.

komljen avatar Sep 08 '22 08:09 komljen

that is correct, I think that is going to be the fix.

It should be available today in the pre-release

On Thu, Sep 8, 2022 at 2:00 AM Alen Komljen @.***> wrote:

is this still happening in v0.19.8?

I didn't try that version yet but will wait for this PR #2479 https://github.com/runatlantis/atlantis/pull/2479. Seems like a proper fix for this issue.

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2187#issuecomment-1240432103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERA2GYEZSPTZRR3HWYTV5GTJRANCNFSM5SVRFIRA . You are receiving this because you were mentioned.Message ID: @.***>

jamengual avatar Sep 08 '22 14:09 jamengual

+1

jullianow avatar Oct 03 '22 19:10 jullianow

this has been already fixed, test the new version

On Mon, Oct 3, 2022 at 12:39 PM Julliano Goncalves @.***> wrote:

+1

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2187#issuecomment-1265942290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERBHQ3EMI6UHBAYBHJ3WBMY7JANCNFSM5SVRFIRA . You are receiving this because you were mentioned.Message ID: @.***>

jamengual avatar Oct 03 '22 19:10 jamengual

Yes, forgot to update here, but no issues with the latest version.

komljen avatar Oct 03 '22 21:10 komljen

this has been already fixed, test the new version

we are still hitting it with latest 0.19.8

edit: https://github.com/runatlantis/atlantis/commit/a4a49bf46fb2ea83804d7b8fa2dae3e4c5646a01 i see this is in 0.19.9 :crossed_fingers:

primeroz avatar Oct 05 '22 10:10 primeroz