atlantis
atlantis copied to clipboard
"Invalid Key" when attempting to stream logs over websocket
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
Cannot view streaming logs. The github actions check appears but the "details" button redirects to a page with the atlantis logo an a blank black text box.
The browser console shows:
WebSocket connection to 'wss://example.org/jobs/4bf24863-4b49-490f-afe1-d25d44a6530b/ws' failed:
but other, non-websocket requests (icons, css, js, etc) are successful.
Reproduction Steps
- deploy atlantis using helm chart version 3.16.0 (atlantis version 0.19.1)
- run an atlantis plan on PR in github
- click the "details" button in the "checks" section
- connection to websocket fail
Logs
2022-03-09 09:36:07 {"level":"debug","ts":"2022-03-09T01:36:07.393Z","caller":"server/middleware.go:44","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b – from 127.0.0.1:57890","json":{}}
2022-03-09 09:36:07 {"level":"debug","ts":"2022-03-09T01:36:07.393Z","caller":"server/middleware.go:70","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b – respond HTTP 200","json":{}}
2022-03-09 09:36:09 {"level":"debug","ts":"2022-03-09T01:36:09.703Z","caller":"server/middleware.go:44","msg":"GET /static/images/atlantis-icon_512.png – from 127.0.0.1:57890","json":{}}
2022-03-09 09:36:09 {"level":"debug","ts":"2022-03-09T01:36:09.705Z","caller":"server/middleware.go:70","msg":"GET /static/images/atlantis-icon_512.png – respond HTTP 200","json":{}}
2022-03-09 09:36:11 {"level":"debug","ts":"2022-03-09T01:36:11.241Z","caller":"server/middleware.go:44","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b/ws – from 127.0.0.1:57890","json":{}}
2022-03-09 09:36:11 {"level":"error","ts":"2022-03-09T01:36:11.241Z","caller":"logging/simple_logger.go:161","msg":"invalid key: 4bf24863-4b49-490f-afe1-d25d44a6530b","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log
github.com/runatlantis/atlantis/server/logging/simple_logger.go:161
github.com/runatlantis/atlantis/server/controllers.(*JobsController).respond
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:67
github.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:60
net/http.HandlerFunc.ServeHTTP
net/http/server.go:2047
github.com/gorilla/mux.(*Router).ServeHTTP
github.com/gorilla/[email protected]/mux.go:210
github.com/urfave/negroni.Wrap.func1
github.com/urfave/[email protected]/negroni.go:46
github.com/urfave/negroni.HandlerFunc.ServeHTTP
github.com/urfave/[email protected]/negroni.go:29
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP
github.com/runatlantis/atlantis/server/middleware.go:68
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/urfave/negroni.(*Recovery).ServeHTTP
github.com/urfave/[email protected]/recovery.go:193
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/urfave/negroni.(*Negroni).ServeHTTP
github.com/urfave/[email protected]/negroni.go:96
net/http.serverHandler.ServeHTTP
net/http/server.go:2879
net/http.(*conn).serve
net/http/server.go:1930"}
2022-03-09 09:36:11 {"level":"debug","ts":"2022-03-09T01:36:11.241Z","caller":"server/middleware.go:70","msg":"GET /jobs/4bf24863-4b49-490f-afe1-d25d44a6530b/ws – respond HTTP 400","json":{}}
2022-03-09 09:36:43 {"level":"debug","ts":"2022-03-09T01:36:43.178Z","caller":"server/middleware.go:44","msg":"GET /healthz – from 127.0.0.1:56884","json":{}}
2022-03-09 09:36:43 {"level":"debug","ts":"2022-03-09T01:36:43.178Z","caller":"server/middleware.go:70","msg":"GET /healthz – respond HTTP 200","json":{}}
2022-03-09 09:35:43 {"level":"debug","ts":"2022-03-09T01:35:43.181Z","caller":"server/middleware.go:70","msg":"GET /healthz – respond HTTP 200","json":{}}
Environment details
-
Atlantis version: 0.19.1
-
helm chart values
test:
enabled: false
logLevel: "debug"
atlantisUrl: https://example.org
defaultTFVersion: 0.15.4
config: |
hide-prev-plan-comments: true
ingress:
enabled: true
path: /
host: gh.example.org
annotations:
kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: "true"
# https://api.github.com/meta
nginx.ingress.kubernetes.io/whitelist-source-range: 192.30.252.0/22,185.199.108.0/22,140.82.112.0/20
labels: {}
tls:
- secretName: gh-atlantis-example-org-tls
hosts:
- gh.example.org
repoConfig: |
---
repos:
- id: /.*/
apply_requirements: [approved, mergeable]
workflow: default
allow_custom_workflows: false
workflows:
default:
plan:
steps:
- init
- run: terraform${ATLANTIS_TERRAFORM_VERSION} plan -input=false -refresh -no-color -out $PLANFILE | grep -vF 'Refreshing state...'
apply:
steps: [apply]
resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 1Gi
cpu: 2000m
environmentSecrets:
- name: ATLANTIS_GH_USER
secretKeyRef:
name: ci-ea-atlantis
key: github-user
- name: ATLANTIS_GH_WEBHOOK_SECRET
secretKeyRef:
name: ci-ea-atlantis
key: github-webhook-secret
- name: ATLANTIS_GH_TOKEN
secretKeyRef:
name: ci-ea-atlantis
key: github-token
- name: GITHUB_TOKEN
secretKeyRef:
name: ci-ea-atlantis
key: github-token
- name: ATLANTIS_TFE_TOKEN
secretKeyRef:
name: ci-ea-atlantis
key: terraform-token
- name: ARM_TENANT_ID
secretKeyRef:
name: ci-ea-atlantis
key: terraform-arm-tenant-id
- name: ARM_CLIENT_ID
secretKeyRef:
name: ci-ea-atlantis
key: terraform-arm-client-id
- name: ARM_CLIENT_SECRET
secretKeyRef:
name: ci-ea-atlantis
key: terraform-arm-client-secret
podTemplate:
annotations:
linkerd.io/inject: enabled
labels:
aadpodidbinding: jar-ci-ea-mi-app-atlantis
extraVolumes:
- name: secrets-store-inline
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: ci-ea-atlantis
extraVolumeMounts:
- name: secrets-store-inline
mountPath: /mnt/secrets-store
readOnly: true
extraManifests:
- apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ci-ea-atlantis-dashboard
labels:
app.kubernetes.io/name: atlantis
app.kubernetes.io/instance: ci-ea-atlantis
annotations:
nginx.ingress.kubernetes.io/upstream-vhost: $service_name.$namespace.svc.cluster.local
kubernetes.io/ingress.class: nginx-internal
kubernetes.io/tls-acme: "true"
spec:
tls:
- secretName: atlantis-example-org-tls
hosts:
- example.org
rules:
- host: example.org
http:
paths:
- path: /
backend:
serviceName: ci-ea-atlantis
servicePort: 80
- apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass
metadata:
name: ci-ea-atlantis
spec:
provider: azure
parameters:
usePodIdentity: "true"
tenantId: < id >
keyvaultName: JAR-CI-EA-KV-ATLANTIS
objects: |
array:
- |
objectName: github-user
objectType: secret
- |
objectName: github-webhook-secret
objectType: secret
- |
objectName: github-token
objectType: secret
- |
objectName: terraform-token
objectType: secret
- |
objectName: terraform-arm-tenant-id
objectType: secret
- |
objectName: terraform-arm-client-id
objectType: secret
- |
objectName: terraform-arm-client-secret
objectType: secret
secretObjects:
- secretName: ci-ea-atlantis
type: Opaque
data:
- key: github-user
objectName: github-user
- key: github-webhook-secret
objectName: github-webhook-secret
- key: github-token
objectName: github-token
- key: terraform-token
objectName: terraform-token
- key: terraform-arm-tenant-id
objectName: terraform-arm-tenant-id
- key: terraform-arm-client-id
objectName: terraform-arm-client-id
- key: terraform-arm-client-secret
objectName: terraform-arm-client-secret
Additional Context
I'm not sure if this is an Atlantis issue or maybe an nginx / ingress issue, i would be very thankful for any advice :)
@alex-mozejko we face the same issue. You should be able to rule out ingress/nginx issue by doing a port-forward to the service or pod and then try to call the job endpoint again.
It doesn't change anything on our side though.
I think it's coming from here(that's what the logs indicate), where it checks if the job exists with the given ID: https://github.com/runatlantis/atlantis/blob/master/server/controllers/websocket/mux.go#L58 https://github.com/runatlantis/atlantis/blob/master/server/controllers/jobs_controller.go#L67
For our setup I found the reason. The documentation says that the log stream doesn't work with custom apply / plan commands. We use a Makefile as a wrapper. I got the log stream working now by using the default init/plan/apply from Atlantis.
Ah same here, we're using Terragrunt and encountering the same issue. Just tested on a plain Terraform project using the default workflow and it works perfectly. Custom workflow support would be amazing!
👍 for custom workflow support
I think there are two separate and unrelated issues being discussed in this issue:
- Ingress websocket issues (possibly related to ingress controller config or LB used, eg Classic ELB does not support websockets you need to use NLB or ALB)
- and issues with realtime streaming of logs from custom workflows.
FWIW, Atlantis really needs a configuration option to disable websockets.
is this still happening with v0.19.8
?
Still happend v0.19.8 We used nginx ingress with Azure LB
it has been failed to get wss
output
the wss
worked for me until I made a change when put into environment variable as below,
{
name = "ATLANTIS_TFE_TOKEN"
value = "xxxx"
}
Environment details:
source = "terraform-aws-modules/atlantis/aws" version = "~> 3.0"
module "atlantis" {
source = "terraform-aws-modules/atlantis/aws"
version = "~> 3.0"
name = "atlantis"
# VPC
cidr = "10.20.0.0/16"
azs = ["ap-southeast-1a", "ap-southeast-1b", "ap-southeast-1c"]
private_subnets = ["10.20.1.0/24", "10.20.2.0/24", "10.20.3.0/24"]
public_subnets = ["10.20.101.0/24", "10.20.102.0/24", "10.20.103.0/24"]
# DNS (without trailing dot)
route53_zone_name = "xxxx"
# ACM (SSL certificate) - Specify ARN of an existing certificate or new one will be created and validated using Route53 DNS
certificate_arn = "arn:aws:acm:xxxxxx"
custom_environment_variables = [
{
name : "ATLANTIS_REPO_CONFIG_JSON",
value : jsonencode(yamldecode(file("${path.module}/server.yaml"))),
},
{
name = "ATLANTIS_TFE_TOKEN"
value = "xxxx"
}
]
# Atlantis
atlantis_github_user = "xxx"
atlantis_github_user_token = "xxxx"
atlantis_repo_allowlist = ["xxx"]
policies_arn = ["xxxx"]
user = "100:1000"
}
@hungran i observe the same behavior - logs reflected fine without ATLANTIS_TFE_TOKEN and i have blank screen with it
as a workaround to solve this, instead of using the ATLANTIS_TFE_TOKEN
variable, you can inject it as a volume.
Create a secret from terraformrc file
$ cat ~/.terraformrc
credentials "app.terraform.io" {
token = "xxx"
}
$ kubectl create secret generic atlantis-terraformrc --from-file=~/.terraformrc
Then inject the secret as a volume in the helm chart
extraVolumes:
- name: terraformrc
secret:
secretName: atlantis-terraformrc
defaultMode: 0400
extraVolumeMounts:
- name: terraformrc
mountPath: /home/atlantis/
readOnly: true
We are having the same issue with 0.20.1
running in AWS ECS. We do not have ATLANTIS_TFE_TOKEN
set.
This is what we see in the logs:
{"level":"error","ts":"2022-11-10T11:40:31.070Z","caller":"logging/simple_logger.go:163","msg":"invalid key: 67b1873e-03cd-4574-a903-66a61805bdec","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log\n\tgithub.com/runatlantis/atlantis/server/logging/simple_logger.go:163\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).respond\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:92\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).getProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:70\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:83\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2109\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:210\ngithub.com/urfave/negroni.Wrap.func1\n\tgithub.com/urfave/[email protected]/negroni.go:46\ngithub.com/urfave/negroni.HandlerFunc.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:29\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP\n\tgithub.com/runatlantis/atlantis/server/middleware.go:70\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/urfave/negroni.(*Recovery).ServeHTTP\n\tgithub.com/urfave/[email protected]/recovery.go:193\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/urfave/negroni.(*Negroni).ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:96\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2947\nnet/http.(*conn).serve\n\tnet/http/server.go:1991"}
{
"level": "error",
"ts": "2022-11-10T11:40:31.070Z",
"caller": "logging/simple_logger.go:163",
"msg": "invalid key: 67b1873e-03cd-4574-a903-66a61805bdec",
"json": {},
"stacktrace": "...see below..."
}
$ echo -E $stacktrace | jq -r .stacktrace
github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log
github.com/runatlantis/atlantis/server/logging/simple_logger.go:163
github.com/runatlantis/atlantis/server/controllers.(*JobsController).respond
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:92
github.com/runatlantis/atlantis/server/controllers.(*JobsController).getProjectJobsWS
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:70
github.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:83
net/http.HandlerFunc.ServeHTTP
net/http/server.go:2109
github.com/gorilla/mux.(*Router).ServeHTTP
github.com/gorilla/[email protected]/mux.go:210
github.com/urfave/negroni.Wrap.func1
github.com/urfave/[email protected]/negroni.go:46
github.com/urfave/negroni.HandlerFunc.ServeHTTP
github.com/urfave/[email protected]/negroni.go:29
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP
github.com/runatlantis/atlantis/server/middleware.go:70
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/urfave/negroni.(*Recovery).ServeHTTP
github.com/urfave/[email protected]/recovery.go:193
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/urfave/negroni.(*Negroni).ServeHTTP
github.com/urfave/[email protected]/negroni.go:96
net/http.serverHandler.ServeHTTP
net/http/server.go:2947
net/http.(*conn).serve
net/http/server.go:1991
Hmmm @amontalban seems like the issue thrown for you is in this block
https://github.com/runatlantis/atlantis/blob/4cacaeb0c7633c98a9a5f873faaf391b259bf355/server/controllers/jobs_controller.go#L66-L88
we are using terragrunt for our workflows, and faced the exact same issue as well with the below msg and stacktrace
msg:
invalid key: 7d8ffd20-f9c4-4977-87c7-d11185c6d1b9
stacktrace:
github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log
github.com/runatlantis/atlantis/server/logging/simple_logger.go:163
github.com/runatlantis/atlantis/server/controllers.(*JobsController).respond
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:92
github.com/runatlantis/atlantis/server/controllers.(*JobsController).getProjectJobsWS
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:70
github.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS
github.com/runatlantis/atlantis/server/controllers/jobs_controller.go:83
net/http.HandlerFunc.ServeHTTP
net/http/server.go:2109
github.com/gorilla/mux.(*Router).ServeHTTP
github.com/gorilla/[email protected]/mux.go:210
github.com/urfave/negroni.Wrap.func1
github.com/urfave/[email protected]/negroni.go:46
github.com/urfave/negroni.HandlerFunc.ServeHTTP
github.com/urfave/[email protected]/negroni.go:29
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP
github.com/runatlantis/atlantis/server/middleware.go:70
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/urfave/negroni.(*Recovery).ServeHTTP
github.com/urfave/[email protected]/recovery.go:193
github.com/urfave/negroni.middleware.ServeHTTP
github.com/urfave/[email protected]/negroni.go:38
github.com/urfave/negroni.(*Negroni).ServeHTTP
github.com/urfave/[email protected]/negroni.go:96
net/http.serverHandler.ServeHTTP
net/http/server.go:2947
net/http.(*conn).serve
net/http/server.go:1991
turns out its because we had incorrect s3 permissions for terragrunt
to read our remote state and the atlantis plan
action was actually hanging at this prompt:
Error running plan operation: running "terragrunt plan -input=false -out=$PLANFILE" in "/home/atlantis/.atlantis/repos//infrastructure/169/default/states/acct.gcc-r-mgmt/ecr": signal: terminated: running "terragrunt plan -input=false -out=$PLANFILE" in "/home/atlantis/.atlantis/repos/infrastructure/169/default/states/acct.gcc-r-mgmt/ecr":
Remote state S3 bucket sst-s3-<redacted> does not exist or you don't have permissions to access it. Would you like Terragrunt to create it? (y/n)
this prompt wasn't shown on the UI's log stream likely because it hadn't been flushed yet as it waiting for user input and there weren't enough lines for the buffer to flush. we realised it only when we restarted our ECS services that forced the buffer to flush and then we saw the prompt on our ECS logs.
once our s3 permissions was fixed, the websocket responded fine and we could see our log streams:
@amontalban maybe this might help!
repos.yml for reference
repos:
- id: "/.*/"
workflow: terragrunt
apply_requirements: [ approved, mergeable ]
workflows:
terragrunt:
plan:
steps:
- env:
name: TERRAGRUNT_TFPATH
command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
- run: terragrunt plan -input=false -out=$PLANFILE
- run: terragrunt show -json $PLANFILE > $SHOWFILE
apply:
steps:
- env:
name: TERRAGRUNT_TFPATH
command: 'echo "terraform${ATLANTIS_TERRAFORM_VERSION}"'
- run: terragrunt apply -input=false $PLANFILE
This may be related to https://github.com/runatlantis/atlantis/issues/2026
Please try @houdinisparks method and use the latest version if possible.
@amontalban @prastamaha @anton-livewyer @hungran @macropin @davidh-unmind @jonwtech @claas-fridtjof-lisowski
This issue is still present onatlantis v0.24.0
, removing the tfe token did indeed fix it.
we solved the issue by this ATLANTIS_TFE_LOCAL_EXECUTION_MODE=true
Same issue with ws connection failure and "invalid key", but we have never specified ATLANTIS_TFE_TOKEN
. Currently we just run terraform, nothing fancy. It's weird because other commits I've made worked, it's just my latest commit that ran into the issue.
We're using v0.24.4