atlantis
atlantis copied to clipboard
Streaming does not appear to work properly with ALB
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
When performing a plan or apply, the link that is given to view the live stream doesn't appear to run anything, just a blank black box.
(looks like similar issues posted in the original PR - https://github.com/runatlantis/atlantis/pull/1937)
Atlantis Version: 0.18.2
If the streaming window is already open, it will print out -----Starting New Process-----
, but that is all
Based on the logs, it is getting a broken pipe possibly because an ALB is being used?
Reproduction Steps
Run an atlantis plan or apply Go to streaming url [No input]
Logs
Environment details
Atlantis: 0.18.2
Additional Context
We are running Atlantis as a single Docker container on an AWS ECS cluster with an ALB in front of it
we're experiencing the same issue... atlantis does give us some logs:
{
"level": "error",
"ts": "2022-01-27T21:59:48.706Z",
"caller": "logging/simple_logger.go:161",
"msg": "writing to ws 2uinc/atlantis-test-repo/15/iam/default: upgrading websocket connection: websocket: the client is not using the websocket protocol: 'upgrade' token not found in 'Connection' header",
"json": {},
"stacktrace": "github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log\n\tgithub.com/runatlantis/atlantis/server/logging/simple_logger.go:161\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).respond\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:141\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:134\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2047\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:210\ngithub.com/urfave/negroni.Wrap.func1\n\tgithub.com/urfave/[email protected]/negroni.go:46\ngithub.com/urfave/negroni.HandlerFunc.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:29\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP\n\tgithub.com/runatlantis/atlantis/server/middleware.go:69\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/urfave/negroni.(*Recovery).ServeHTTP\n\tgithub.com/urfave/[email protected]/recovery.go:193\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/urfave/negroni.(*Negroni).ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:96\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2879\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"
}
I'm not sure what to make of them though. Hope this helps, cause the devs will be stoked to have this feature working.
I see the same issue in AKS with the AKS LB and an Azure App Gateway.
See the same here. With ALB in front of EC2.
Same here.
EKS + ELB + Traefik
got the same issue with ALB + ECS in AWS.
Same issue but when accessing directly a cloud VM. via https://11.22.33.44:9100/jobs/...
anything about this?
I get 200 in response but the json is empty:
{"level":"debug","ts":"2022-05-22T17:14:45.011Z","caller":"server/middleware.go:44","msg":"GET /jobs/.../735/terraform/default/ws – from 172.30.10.164:38882","json":{}}
and after 60 sec I get this (which is the timeout on the LB):
{"level":"warn","ts":"2022-05-22T17:15:45.011Z","caller":"websocket/writer.go:62","msg":"Failed to read WS message: websocket: close 1006 (abnormal closure): unexpected EOF","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/controllers/websocket.(*Writer).setReadHandler\n\tgithub.com/runatlantis/atlantis/server/controllers/websocket/writer.go:62"}
- Increasing timeout on the LB did not help
- Running Atlantis without any wrappers - regular terraform commands.
Same here, we use custom workflows. Issue persist with and without proxies (i.e. with k8s port-forward issue is the same). Could be related to custom workflows still as according to documentation it's only supported for regular terraform commands https://www.runatlantis.io/docs/streaming-logs.html#real-time-logs. However it would be really nice to have that working for the other cases too.
is this still happening with v0.19.8
?
@jamengual not for us :(
@jamengual yes this is still an issue with v0.19.8
and does not appear to be related to ALB or proxy configuration. We are running with an ALB -> ECS Fargate and I was able to reproduce the issue with and without the ALB in play.
ok, we will look into this.
In our case this was not the AWS ALB but instead it was the nginx sidecar container we have running alongside atlantis in ECS Fargate.
Adding the following settings in nginx.conf resolved the issue for us.
location / {
# redirect all traffic to the backend
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;
proxy_pass http://${APP}:${APP_PORT};
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
@jamengual I have this working successfully with ALB & ECS and custom workflows, no additional config needed.
did you do any specific alb configs?
do you have tf code you can show that could help people looking at this issue?
On Tue, Aug 30, 2022, 8:24 a.m. Evan Stachowiak @.***> wrote:
@jamengual https://github.com/jamengual I have this working successfully with ALB & ECS and custom workflows, no additional config needed.
— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2026#issuecomment-1231820423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERHYL5D3D5DEM3EM56LV3YRSBANCNFSM5NBKS5MQ . You are receiving this because you were mentioned.Message ID: @.***>
@evanstachowiak any specific configuration? We still fail to have it working properly.
I'm not sure what sort of config to include here.
I have an alb that forwards 443 -> 4141 with atlantis target group.
I don't have nginx running or any other proxy in between.
Same here, External HTTPS Load Balancer and VM on GCP.
I bet it's something to do with the configuration of the load balancer. I have it working on a load balancer in AWS.
Is stickiness enabled on the load balancer?
https://stackoverflow.com/a/40423241/2965993
Has anyone contacted aws or gcp support to figure out what the issue could be?
I have deployed about 10+ Atlantis servers in aws using ALBs and never had a problem with the log streaming.
some people have reported corporate firewall denying connection, bad configuration in the LBs, antivirus firewalls could cause issues, but that is not on the Atlantis side.
It probably isn't related to Atlantis, but I do think that if this many people run into this we need to sort out what that common pitfall is and document it properly in the Deployment section.
anything about this? I get 200 in response but the json is empty:
{"level":"debug","ts":"2022-05-22T17:14:45.011Z","caller":"server/middleware.go:44","msg":"GET /jobs/.../735/terraform/default/ws – from 172.30.10.164:38882","json":{}}
and after 60 sec I get this (which is the timeout on the LB):
{"level":"warn","ts":"2022-05-22T17:15:45.011Z","caller":"websocket/writer.go:62","msg":"Failed to read WS message: websocket: close 1006 (abnormal closure): unexpected EOF","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/controllers/websocket.(*Writer).setReadHandler\n\tgithub.com/runatlantis/atlantis/server/controllers/websocket/writer.go:62"}
- Increasing timeout on the LB did not help
- Running Atlantis without any wrappers - regular terraform commands.
Just an update from my side, per the given error it appears that I was running an old version of atlantis v0.18.2 Updated to the newest available stable version and the issue is gone as I'm able to get the streamed logs.
That's great @mcrivar! Thank you for sharing.
For all others who are running into issues, could you folks use the latest version and confirm if the issue is still present?
cc @askmike1 @s33dunda @MattMencel @davidh-unmind @gmontanola @pantelis-karamolegkos @adutchak @spamoom
This was 100% to do with websockets for me. Once I configured websockets on my LB/Ingress it started working. See #2216
Running version 0.22.3, this is still happening in GCP, running in a VM. Sometimes we get some data, sometimes none. There are no specific config for websockets in GCP LBs.
@bschaatsbergen do you see the gcp web socket issue using your gce Atlantis module?
https://github.com/bschaatsbergen/terraform-gce-atlantis
@nitrocode I found out the issue for Google Cloud users, when using Identity Aware Proxy (to protect the Atlantis UI) websockets are not supported.. the bearer authorization header is stripped off.
Ah that's good to know. Thank you for closing the loop on that.
Is this relevant for that ?
https://cloud.google.com/iap/docs/authentication-howto#authenticating_from_proxy-authorization_header
I wonder if other people in this thread are running into similar issues where something that fronts atlantis is manipulating the bearer authorization header which leads to the websocket failure.
This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'
@nitrocode I found out the issue for Google Cloud users, when using Identity Aware Proxy (to protect the Atlantis UI) websockets are not supported.. the bearer authorization header is stripped off.
@bschaatsbergen I'm seeing something similar where the "live plan" view via the console UI has strange behavior where sometimes the full plan doesn't display (especially for larger plans). I used your terraform module for GCE setup.
Any ideas for a workaround? Could I put nginx in front of the atlantis docker container to deal with the authorization header issue?