atlantis icon indicating copy to clipboard operation
atlantis copied to clipboard

Streaming does not appear to work properly with ALB

Open askmike1 opened this issue 3 years ago • 15 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

When performing a plan or apply, the link that is given to view the live stream doesn't appear to run anything, just a blank black box.

image

(looks like similar issues posted in the original PR - https://github.com/runatlantis/atlantis/pull/1937)

Atlantis Version: 0.18.2

If the streaming window is already open, it will print out -----Starting New Process-----, but that is all

Based on the logs, it is getting a broken pipe possibly because an ALB is being used?

Reproduction Steps

Run an atlantis plan or apply Go to streaming url [No input]

Logs

Environment details

Atlantis: 0.18.2

Additional Context

We are running Atlantis as a single Docker container on an AWS ECS cluster with an ALB in front of it

askmike1 avatar Jan 28 '22 17:01 askmike1

we're experiencing the same issue... atlantis does give us some logs:

{
  "level": "error",
  "ts": "2022-01-27T21:59:48.706Z",
  "caller": "logging/simple_logger.go:161",
  "msg": "writing to ws 2uinc/atlantis-test-repo/15/iam/default: upgrading websocket connection: websocket: the client is not using the websocket protocol: 'upgrade' token not found in 'Connection' header",
  "json": {},
  "stacktrace": "github.com/runatlantis/atlantis/server/logging.(*StructuredLogger).Log\n\tgithub.com/runatlantis/atlantis/server/logging/simple_logger.go:161\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).respond\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:141\ngithub.com/runatlantis/atlantis/server/controllers.(*JobsController).GetProjectJobsWS\n\tgithub.com/runatlantis/atlantis/server/controllers/jobs_controller.go:134\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2047\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:210\ngithub.com/urfave/negroni.Wrap.func1\n\tgithub.com/urfave/[email protected]/negroni.go:46\ngithub.com/urfave/negroni.HandlerFunc.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:29\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/runatlantis/atlantis/server.(*RequestLogger).ServeHTTP\n\tgithub.com/runatlantis/atlantis/server/middleware.go:69\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/urfave/negroni.(*Recovery).ServeHTTP\n\tgithub.com/urfave/[email protected]/recovery.go:193\ngithub.com/urfave/negroni.middleware.ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:38\ngithub.com/urfave/negroni.(*Negroni).ServeHTTP\n\tgithub.com/urfave/[email protected]/negroni.go:96\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2879\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"
}

I'm not sure what to make of them though. Hope this helps, cause the devs will be stoked to have this feature working.

s33dunda avatar Jan 31 '22 18:01 s33dunda

I see the same issue in AKS with the AKS LB and an Azure App Gateway.

MattMencel avatar Feb 03 '22 16:02 MattMencel

See the same here. With ALB in front of EC2.

david-heward-unmind avatar Feb 04 '22 18:02 david-heward-unmind

Same here.

EKS + ELB + Traefik

gmontanola avatar Feb 09 '22 19:02 gmontanola

got the same issue with ALB + ECS in AWS.

mcrivar avatar Mar 07 '22 20:03 mcrivar

Same issue but when accessing directly a cloud VM. via https://11.22.33.44:9100/jobs/...

pantelis-karamolegkos avatar Mar 11 '22 08:03 pantelis-karamolegkos

anything about this? I get 200 in response but the json is empty: {"level":"debug","ts":"2022-05-22T17:14:45.011Z","caller":"server/middleware.go:44","msg":"GET /jobs/.../735/terraform/default/ws – from 172.30.10.164:38882","json":{}}

and after 60 sec I get this (which is the timeout on the LB): {"level":"warn","ts":"2022-05-22T17:15:45.011Z","caller":"websocket/writer.go:62","msg":"Failed to read WS message: websocket: close 1006 (abnormal closure): unexpected EOF","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/controllers/websocket.(*Writer).setReadHandler\n\tgithub.com/runatlantis/atlantis/server/controllers/websocket/writer.go:62"}

  • Increasing timeout on the LB did not help
  • Running Atlantis without any wrappers - regular terraform commands.

mcrivar avatar May 22 '22 17:05 mcrivar

Same here, we use custom workflows. Issue persist with and without proxies (i.e. with k8s port-forward issue is the same). Could be related to custom workflows still as according to documentation it's only supported for regular terraform commands https://www.runatlantis.io/docs/streaming-logs.html#real-time-logs. However it would be really nice to have that working for the other cases too.

adutchak avatar May 24 '22 10:05 adutchak

is this still happening with v0.19.8?

jamengual avatar Aug 26 '22 03:08 jamengual

@jamengual not for us :(

2022-08-26 at 08 43 17@2x

spamoom avatar Aug 26 '22 07:08 spamoom

@jamengual yes this is still an issue with v0.19.8 and does not appear to be related to ALB or proxy configuration. We are running with an ALB -> ECS Fargate and I was able to reproduce the issue with and without the ALB in play.

jreslock avatar Aug 26 '22 15:08 jreslock

ok, we will look into this.

jamengual avatar Aug 26 '22 18:08 jamengual

In our case this was not the AWS ALB but instead it was the nginx sidecar container we have running alongside atlantis in ECS Fargate.

Adding the following settings in nginx.conf resolved the issue for us.

    location / {
      # redirect all traffic to the backend
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $http_host;
      proxy_pass http://${APP}:${APP_PORT};

      # WebSocket support
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "Upgrade";
    }

jreslock avatar Aug 29 '22 14:08 jreslock

@jamengual I have this working successfully with ALB & ECS and custom workflows, no additional config needed.

evanstachowiak avatar Aug 30 '22 15:08 evanstachowiak

did you do any specific alb configs?

do you have tf code you can show that could help people looking at this issue?

On Tue, Aug 30, 2022, 8:24 a.m. Evan Stachowiak @.***> wrote:

@jamengual https://github.com/jamengual I have this working successfully with ALB & ECS and custom workflows, no additional config needed.

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2026#issuecomment-1231820423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERHYL5D3D5DEM3EM56LV3YRSBANCNFSM5NBKS5MQ . You are receiving this because you were mentioned.Message ID: @.***>

jamengual avatar Aug 30 '22 15:08 jamengual

@evanstachowiak any specific configuration? We still fail to have it working properly.

mcrivar avatar Oct 03 '22 11:10 mcrivar

I'm not sure what sort of config to include here.

I have an alb that forwards 443 -> 4141 with atlantis target group.

image

I don't have nginx running or any other proxy in between.

evanstachowiak avatar Oct 07 '22 12:10 evanstachowiak

Same here, External HTTPS Load Balancer and VM on GCP.

bschaatsbergen avatar Dec 31 '22 20:12 bschaatsbergen

I bet it's something to do with the configuration of the load balancer. I have it working on a load balancer in AWS.

Is stickiness enabled on the load balancer?

https://stackoverflow.com/a/40423241/2965993

Has anyone contacted aws or gcp support to figure out what the issue could be?

nitrocode avatar Jan 01 '23 15:01 nitrocode

I have deployed about 10+ Atlantis servers in aws using ALBs and never had a problem with the log streaming.

some people have reported corporate firewall denying connection, bad configuration in the LBs, antivirus firewalls could cause issues, but that is not on the Atlantis side.

jamengual avatar Jan 01 '23 18:01 jamengual

It probably isn't related to Atlantis, but I do think that if this many people run into this we need to sort out what that common pitfall is and document it properly in the Deployment section.

bschaatsbergen avatar Jan 02 '23 10:01 bschaatsbergen

anything about this? I get 200 in response but the json is empty: {"level":"debug","ts":"2022-05-22T17:14:45.011Z","caller":"server/middleware.go:44","msg":"GET /jobs/.../735/terraform/default/ws – from 172.30.10.164:38882","json":{}}

and after 60 sec I get this (which is the timeout on the LB): {"level":"warn","ts":"2022-05-22T17:15:45.011Z","caller":"websocket/writer.go:62","msg":"Failed to read WS message: websocket: close 1006 (abnormal closure): unexpected EOF","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/controllers/websocket.(*Writer).setReadHandler\n\tgithub.com/runatlantis/atlantis/server/controllers/websocket/writer.go:62"}

  • Increasing timeout on the LB did not help
  • Running Atlantis without any wrappers - regular terraform commands.

Just an update from my side, per the given error it appears that I was running an old version of atlantis v0.18.2 Updated to the newest available stable version and the issue is gone as I'm able to get the streamed logs.

mcrivar avatar Jan 17 '23 08:01 mcrivar

That's great @mcrivar! Thank you for sharing.

For all others who are running into issues, could you folks use the latest version and confirm if the issue is still present?

cc @askmike1 @s33dunda @MattMencel @davidh-unmind @gmontanola @pantelis-karamolegkos @adutchak @spamoom

nitrocode avatar Jan 17 '23 13:01 nitrocode

This was 100% to do with websockets for me. Once I configured websockets on my LB/Ingress it started working. See #2216

MattMencel avatar Jan 18 '23 03:01 MattMencel

Running version 0.22.3, this is still happening in GCP, running in a VM. Sometimes we get some data, sometimes none. There are no specific config for websockets in GCP LBs.

DomFourn avatar Jan 26 '23 17:01 DomFourn

@bschaatsbergen do you see the gcp web socket issue using your gce Atlantis module?

https://github.com/bschaatsbergen/terraform-gce-atlantis

nitrocode avatar Jan 27 '23 00:01 nitrocode

@nitrocode I found out the issue for Google Cloud users, when using Identity Aware Proxy (to protect the Atlantis UI) websockets are not supported.. the bearer authorization header is stripped off.

bschaatsbergen avatar Feb 10 '23 13:02 bschaatsbergen

Ah that's good to know. Thank you for closing the loop on that.

Is this relevant for that ?

https://cloud.google.com/iap/docs/authentication-howto#authenticating_from_proxy-authorization_header

I wonder if other people in this thread are running into similar issues where something that fronts atlantis is manipulating the bearer authorization header which leads to the websocket failure.

nitrocode avatar Feb 11 '23 15:02 nitrocode

This issue is stale because it has been open for 1 month with no activity. Remove stale label or comment or this will be closed in 1 month.'

github-actions[bot] avatar Mar 15 '23 01:03 github-actions[bot]

@nitrocode I found out the issue for Google Cloud users, when using Identity Aware Proxy (to protect the Atlantis UI) websockets are not supported.. the bearer authorization header is stripped off.

@bschaatsbergen I'm seeing something similar where the "live plan" view via the console UI has strange behavior where sometimes the full plan doesn't display (especially for larger plans). I used your terraform module for GCE setup.

Any ideas for a workaround? Could I put nginx in front of the atlantis docker container to deal with the authorization header issue?

scott-standard avatar May 24 '23 22:05 scott-standard