Rails WSS connection broken
Hello,
I recently moved my production app from Dokku to Kamal and loving it so far but one feature of rails i've been using went broken - the ActionCable support.
Basically, after the migration, my frontend is not able to connect to /cable endpoint anymore. The message in browser log says: WebSocket connection to 'wss://domain.com/cable' failed: without giving me any more information about why it failed after the colon. Also, the rails log is empty for /cable route. The only suspicious thing i can see in traefik logs are these lines:
2024-03-22T18:56:17.441214126Z time="2024-03-22T18:56:17Z" level=debug msg="'499 Client Closed Request' caused by: context canceled"
But i'm not sure if that's related as this app is on production already with decent traffic so i'm not able to tell if that relates to the failing wss connection or not. However, the amount of these logs doesn't match the amount of failed wss connection retries so i would say that's not it.
My kamal config looks as follows:
service: myapp
image: mbajur/myapp
volumes:
- "/home/app/myapp-cache:/app/tmp/cache"
- "/home/app/myapp-shared:/app/shared"
- "/home/app/myapp-storage:/app/storage"
servers:
web:
hosts:
- x.x.x.x
labels:
traefik.http.routers.myapp.entrypoints: websecure
traefik.http.routers.myapp.rule: Host(`domain.com`)
traefik.http.routers.myapp.tls.certresolver: letsencrypt
options:
network: "private"
job:
hosts:
- x.x.x.x
cmd: bundle exec rake solid_queue:start
options:
network: "private"
clock:
hosts:
- x.x.x.x
cmd: bundle exec clockwork clock.rb
options:
network: "private"
registry:
server: ghcr.io
username: mbajur
password:
- KAMAL_REGISTRY_PASSWORD
# Inject ENV variables into containers (secrets come from .env).
# Remember to run `kamal env push` after making changes!
env:
clear:
HOSTNAME: domain.com
APP_DOMAIN: domain.com
DB_HOST: x.x.x.x
RAILS_SERVE_STATIC_FILES: true
RAILS_LOG_TO_STDOUT: true
ARTISTS_TAXONOMY_ID: 9
CATEGORIES_TAXONOMY_ID: 8
PATTERNS_TAXONOMY_ID: 10
FLIPPER_PSTORE_PATH: shared/flipper.pstore
secret:
- POSTGRES_PASSWORD
- RAILS_MASTER_KEY
ssh:
user: app
builder:
dockerfile: Dockerfile.production
multiarch: false
cache:
type: registry
accessories:
db:
image: postgres:15
host: x.x.x.x
port: 5432
env:
clear:
POSTGRES_USER: "myapp"
POSTGRES_DB: 'myapp_production'
secret:
- POSTGRES_PASSWORD
files:
- config/init.sql:/docker-entrypoint-initdb.d/setup.sql
directories:
- data:/var/lib/postgresql/data
options:
network: "private"
traefik:
args:
accesslog: true
options:
network: "private"
publish:
- "443:443"
volume:
- "/letsencrypt/acme.json:/letsencrypt/acme.json"
args:
entryPoints.web.address: ":80"
entryPoints.websecure.address: ":443"
entryPoints.web.http.redirections.entryPoint.to: websecure # We want to force https
entryPoints.web.http.redirections.entryPoint.scheme: https
entryPoints.web.http.redirections.entrypoint.permanent: true
certificatesResolvers.letsencrypt.acme.email: "[email protected]"
certificatesResolvers.letsencrypt.acme.storage: "/letsencrypt/acme.json" # Must match the path in `volume`
certificatesResolvers.letsencrypt.acme.httpchallenge: true
certificatesResolvers.letsencrypt.acme.httpchallenge.entrypoint: web
healthcheck:
path: /health/ready
port: 4000
max_attempts: 15
# Bridge fingerprinted assets, like JS and CSS, between versions to avoid
# hitting 404 on in-flight requests. Combines all files from new and old
# version inside the asset_path.
# asset_path: /rails/public/assets
# Configure rolling deploys by setting a wait time between batches of restarts.
# boot:
# limit: 10 # Can also specify as a percentage of total hosts, such as "25%"
# wait: 2
# Configure the role used to determine the primary_host. This host takes
# deploy locks, runs health checks during the deploy, and follow logs, etc.
#
# Caution: there's no support for role renaming yet, so be careful to cleanup
# the previous role on the deployed hosts.
# primary_role: web
# Controls if we abort when see a role with no hosts. Disabling this may be
# useful for more complex deploy configurations.
#
# allow_empty_roles: false
Thank you in advance for any clues!
Hi @mbajur - you might be able do get some more insight into this by enabling Traefik access logs:
traefik:
args:
accesslog: true
accesslog.format: json
# Include HTTP headers in logs like so:
accesslog.fields.headers.names.User-Agent: keep
There's more documentation here - https://doc.traefik.io/traefik/observability/access-logs/.
@djmb sadly, enabling that doesn't show anything more in the traefik logs. It's like this request never reaches the server
@mbajur - did you reboot Traefik (bin/kamal traefik reboot)? You'll need to do that to update the settings.
@djmb Ah, right, good catch :) I rebooted and now i indeed can see an output in the logs related to /cable endpoint but i can't see anything helpful in there. Just these exact same info level logs over and over again:
{
"ClientAddr": "x.x.x.x:63645",
"ClientHost": "x.x.x.x",
"ClientPort": "63645",
"ClientUsername": "-",
"DownstreamContentSize": 14,
"DownstreamStatus": 404,
"Duration": 8246520,
"OriginContentSize": 14,
"OriginDuration": 8210137,
"OriginStatus": 404,
"Overhead": 36383,
"RequestAddr": "domain.com",
"RequestContentSize": 0,
"RequestCount": 219,
"RequestHost": "domain.com",
"RequestMethod": "GET",
"RequestPath": "/cable",
"RequestPort": "-",
"RequestProtocol": "HTTP/1.1",
"RequestScheme": "https",
"RetryAttempts": 0,
"RouterName": "myapp@docker",
"ServiceAddr": "172.18.0.7:4000",
"ServiceName": "myapp-web@docker",
"ServiceURL": {
"Scheme": "http",
"Opaque": "",
"User": null,
"Host": "172.18.0.7:4000",
"Path": "",
"RawPath": "",
"OmitHost": false,
"ForceQuery": false,
"RawQuery": "",
"Fragment": "",
"RawFragment": ""
},
"StartLocal": "2024-05-09T15:34:45.644993371Z",
"StartUTC": "2024-05-09T15:34:45.644993371Z",
"TLSCipher": "TLS_AES_128_GCM_SHA256",
"TLSVersion": "1.3",
"entryPointName": "websecure",
"level": "info",
"msg": "",
"request_User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
"time": "2024-05-09T15:34:45Z"
}
@mbajur Do you have config.hosts set to anything? What about config.action_cable.allowed_request_origins? If you look in the network tab for the websocket request what is the response code?
@nickhammond
Do you have
config.hostsset to anything? What aboutconfig.action_cable.allowed_request_origins?
irb(main):002:0> Rails.application.config.hosts
=> []
irb(main):003:0> Rails.application.config.action_cable
=> {:mount_path=>"/cable", :precompile_assets=>true}
If you look in the network tab for the websocket request what is the response code?
It's just finished over and over again
@nickhammond Hmm. Weird, I have a few apps using ActionCable that are deployed via Kamal. It looks like I do set config.action_cable.allowed_request_origins though to the hostname.
Are you using these vars anywhere to allow/deny in your app?
HOSTNAME: domain.com
APP_DOMAIN: domain.com
That was it! setting config.action_cable.allowed_request_origins fixed it for me. Thanks a TON! 🙏 And btw - yes, i had these two vars setup in my app ;)
Just for some potential readers in the future:
I tried various allowed_request_origins configurations, but nothing worked for me.
In the end I had to not only set allowed_request_origins but also turning off config.force_ssl = true.
Seems not like a great solution, but hey it works 😅
Hope to find a better solution in the future.
I am getting the same error with Kamal 2 in production. In my logs I see:
ActionController::RoutingError (No route matches [GET] "/cable"):
But running Rails.application.config.action_cable in the console returns:
{:mount_path=>"/cable", :precompile_assets=>true, :allowed_request_origins=>["https://example.app"]}
Even config.force_ssl = false doesn't work.