matrix-docker-ansible-deploy icon indicating copy to clipboard operation
matrix-docker-ansible-deploy copied to clipboard

Nginx cannot start after reboot server

Open Seele-Vollerei32 opened this issue 1 year ago • 7 comments

Playbook Configuration:

My vars.yml file looks like this:

---
# The bare domain name which represents your Matrix identity.
# Matrix user ids for your server will be of the form (`@user:<matrix-domain>`).
#
# Note: this playbook does not touch the server referenced here.
# Installation happens on another server ("matrix.<matrix-domain>").
#
# If you've deployed using the wrong domain, you'll have to run the Uninstalling step,
# because you can't change the Domain after deployment.
#
# Example value: example.com
matrix_domain: atunemic.cn

# The Matrix homeserver software to install.
# See `roles/matrix-base/defaults/main.yml` for valid options.
matrix_homeserver_implementation: dendrite

# A secret used as a base, for generating various other secrets.
# You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`).
matrix_homeserver_generic_secret_key: '***'

# This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains.
#
# In case SSL renewal fails at some point, you'll also get an email notification there.
#
# If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt),
# you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`).
#
# Example value: [email protected]
matrix_ssl_lets_encrypt_support_email: '[email protected]'

# A Postgres password to use for the superuser Postgres user (called `matrix` by default).
#
# The playbook creates additional Postgres users and databases (one for each enabled service)
# using this superuser account.
matrix_postgres_connection_password: '***'

matrix_synapse_enable_registration: true
matrix_synapse_registration_requires_token: true
matrix_synapse_registrations_require_3pid: 'email'

matrix_prometheus_enabled: false

matrix_prometheus_node_exporter_enabled: false

matrix_grafana_enabled: false

matrix_grafana_anonymous_access: false

# This has no relation to your Matrix user id. It can be any username you'd like.
# Changing the username subsequently won't work.
matrix_grafana_default_admin_user: "kevin"

# Changing the password subsequently won't work.
matrix_grafana_default_admin_password: "***"

matrix_synapse_admin_enabled: flase

matrix_synapse_ext_password_provider_shared_secret_auth_enabled: false
matrix_synapse_ext_password_provider_shared_secret_auth_shared_secret: ***

matrix_bot_mjolnir_enabled: true
matrix_bot_mjolnir_access_token: "***"
matrix_bot_mjolnir_management_room: "!AVjqHyfcl6BsDRTO:atunemic.cn"
matrix_synapse_ext_spam_checker_mjolnir_antispam_enabled: true
matrix_synapse_ext_spam_checker_mjolnir_antispam_config_block_invites: false
matrix_synapse_ext_spam_checker_mjolnir_antispam_config_block_messages: false
matrix_synapse_ext_spam_checker_mjolnir_antispam_config_block_usernames: false
matrix_synapse_ext_spam_checker_mjolnir_antispam_config_ban_lists: []

matrix_mautrix_telegram_enabled: false
matrix_mautrix_telegram_api_id: 9609852
matrix_mautrix_telegram_api_hash: ***
matrix_mautrix_telegram_bot_token: ***
matrix_mautrix_telegram_configuration_extension_yaml: |
  bridge:
    permissions:
      '*': relaybot
      '@kevin_liu:atunemic.cn': admin


matrix_dimension_enabled: true
matrix_dimension_access_token: "***"
matrix_dimension_admins:
  - "@kevin_liu:{{ matrix_domain }}"

matrix_s3_media_store_enabled: false
matrix_s3_media_store_bucket_name: "matrix-1302020253"
matrix_s3_media_store_aws_access_key: "***"
matrix_s3_media_store_aws_secret_key: "***"
matrix_s3_media_store_custom_endpoint_enabled: true
# Example: "https://storage.googleapis.com"
matrix_s3_media_store_custom_endpoint: "***"

matrix_bot_matrix_registration_bot_enabled: true
# Token obtained via logging into the bot account (see above)
matrix_bot_matrix_registration_bot_bot_access_token: "***"

# Enables registration
matrix_synapse_enable_registration: true

# Restrict registration to users with a token
matrix_synapse_registration_requires_token: true

matrix_ma1sd_enabled: true

matrix_synapse_log_level: "INFO"
matrix_synapse_storage_sql_log_level: "INFO"
matrix_synapse_root_log_level: "INFO"

Matrix Server:

  • OS: archlinux
  • Architecture: amd64

Problem description:

Before I reboot my server, the webui is unable to open. I reboot the server because I thought the load of the server is too heavy for the server to run. But after reboot, it still can't open.

Additional context `journalctl -fu matrix-nginx-proxy.service

Aug 30 16:40:57 archlinux systemd[1]: matrix-nginx-proxy.service: Scheduled restart job, restart counter is at 71.
Aug 30 16:40:57 archlinux systemd[1]: Stopped Matrix nginx-proxy server.
Aug 30 16:40:57 archlinux systemd[1]: Starting Matrix nginx-proxy server...
Aug 30 16:40:57 archlinux systemd[1]: Started Matrix nginx-proxy server.
Aug 30 16:40:57 archlinux matrix-nginx-proxy[22486]: docker: Error response from daemon: driver failed programming external connectivity on endpoint matrix-nginx-proxy (87aad82eee715c36c5c704e9b17f295b0e7f3fff8ef4ebceb9705197d88cb30d): Bind for 0.0.0.0:8448 failed: port is already allocated.
Aug 30 16:40:57 archlinux systemd[1]: matrix-nginx-proxy.service: Main process exited, code=exited, status=125/n/a
Aug 30 16:40:57 archlinux systemd[1]: matrix-nginx-proxy.service: Failed with result 'exit-code'.

in this cycle.

Seele-Vollerei32 avatar Aug 30 '22 08:08 Seele-Vollerei32

See what else could be occupying port 8448 and preventing matrix-nginx-proxy.service from starting.

netstat -anp | grep :8448 may help.

Perhaps you had a manually installed Synapse in the past?

spantaleev avatar Aug 30 '22 08:08 spantaleev

I installed Dendrite manually in the past(from AUR) And there is the output of netstart

tcp        0      0 0.0.0.0:8448            0.0.0.0:*               LISTEN      992/docker-proxy    
tcp6       0      0 :::8448                 :::*                    LISTEN      997/docker-proxy

I have stop all the service of matrix by ansible-playbook -i inventory/hosts setup.yml --tags=stop

Seele-Vollerei32 avatar Aug 30 '22 09:08 Seele-Vollerei32

xI am encountering what I believe to be the same issue.

@Seele-Vollerei32 – Did you find a solution? I'm also curious: is your server pretty low-powered (low memory / CPU)?

Some more details of my issue:

  • I've tried switching between playbook-managed-traefik and playbook-managed-nginx to debug (and because I would take anything that lets me use my server in the short term). The same error message happens when using matrix_playbook_reverse_proxy_type: playbook-managed-traefik and when using playbook-managed-nginx – logs say that Bind for 0.0.0.0:8448 failed: port is already allocated
  • The logs for nginx / traefik show that nginx/traefik is repeatedly attempting to start after each failure
  • The process using the 8448 port is docker-proxy, matching previous comment
  • docker ps shows that matrix-synapse indeed has a binding to 8448/tcp
  • I've tried doing just stop-all -> just setup-all multiple times, thinking that maybe matrix-synapse needs to be down before Traefik/nginx starts; no success
  • I'm using a low-powered server (Oracle Cloud VM.Standard.E2.1.Micro: 1GB of memory + 1GB swapfile). (Could Traefik be racing matrix-synapse? like, maybe other servers launch Traefik more quickly, allowing it to bind to 8448, and somehow matrix-synapse binds to the same port after Traefik launches... somehow? 🤷)
  • My setup is very vanilla – no custom webserver, nothing else on the machine apart from what's deployed by matrix-docker-ansible-deploy
  • Right before I had this issue, I had some failed setup-alls, since my machine was running out of memory mid-setup: server became unresponsive and I had to force reboot. This is no longer happening after I added a swapfile. Not sure if this is relevant.
  • systemctl list-units shows that matrix-container-socket-proxy.service is not found. Checking journalctl for this service shows some logs including Can't open server state file '/var/lib/haproxy/server-state': No such file or directory
  • My SSL certificates are expired – I started this upgrade process to try fixing certbot failing to autorenew SSL.
  • just setup-all fails waiting for Traefik / nginx to start (see log below)
  • I'm pretty sure this is not the same as https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/1687 – I don't see any logs referencing the .pem files when running playbook-managed-nginx
  • Tried the following:
    • docker kill / stoping matrix-synapse, then waiting for nginx to automatically try starting again. This fails with the same error; it looks like matrix-synapse automatically restarts itself and binds to 8448 again. (Is this correct? Doing docker inspect matrix-synapse, I see RestartPolicy is no, which is unexpected.)
    • Set matrix_synapse_federation_port_enabled, matrix_nginx_proxy_proxy_matrix_federation_api_enabled, matrix_synapse_reverse_proxy_companion_federation_api_enabled all to false to try to disable the 8448 port on matrix-synapse (following these docs); ran setup-all – same issue

Happy to make a new issue, but this does sound like the same issue.

vars.yml
---
matrix_domain: earthchat.online
matrix_homeserver_implementation: synapse
matrix_homeserver_generic_secret_key: 'redacted'
matrix_ssl_lets_encrypt_support_email: 'redacted'
devture_postgres_connection_password: 'redacted'
matrix_synapse_admin_enabled: true
matrix_sygnal_enabled: true
matrix_sygnal_apps: 'redacted'

# Disable non-required services
matrix_ma1sd_enabled: false
matrix_mailer_enabled: false
matrix_coturn_enabled: false

matrix_playbook_reverse_proxy_type: playbook-managed-nginx

# also tried with:
# matrix_playbook_reverse_proxy_type: playbook-managed-traefik
# devture_traefik_config_certificatesResolvers_acme_email: 'redacted'
journalctl -fu matrix-nginx-proxy.service

(repeating)

Mar 30 16:53:48 synapse-avenue systemd[1]: Started Matrix nginx-proxy server.
Mar 30 16:53:49 synapse-avenue matrix-nginx-proxy[249953]: time="2023-03-30T16:53:49Z" level=error msg="error waiting for container: context canceled"
Mar 30 16:53:49 synapse-avenue matrix-nginx-proxy[249953]: Error response from daemon: driver failed programming external connectivity on endpoint matrix-nginx-proxy (7139dbe699f9e7d414e3eea5d3413dc401f7c88f27c60f6e4fdb125c0bc7a473): Bind for 0.0.0.0:8448 failed: port is already allocated
Mar 30 16:53:49 synapse-avenue systemd[1]: matrix-nginx-proxy.service: Main process exited, code=exited, status=1/FAILURE
Mar 30 16:53:49 synapse-avenue systemd[1]: matrix-nginx-proxy.service: Failed with result 'exit-code'.
Mar 30 16:54:19 synapse-avenue systemd[1]: matrix-nginx-proxy.service: Scheduled restart job, restart counter is at 118.
Mar 30 16:54:19 synapse-avenue systemd[1]: Stopped Matrix nginx-proxy server.
Mar 30 16:54:19 synapse-avenue systemd[1]: Starting Matrix nginx-proxy server...
Mar 30 16:54:20 synapse-avenue matrix-nginx-proxy[250035]: 3ede0cf7b5554906135a5060c094f86b7fdc5cbdc617bcb4813fa6b3c51ca8e7
Mar 30 16:54:20 synapse-avenue systemd[1]: Started Matrix nginx-proxy server.
journalctl -fu matrix-traefik.service

Very similar to nginx above

Mar 30 18:29:53 synapse-avenue systemd[1]: matrix-traefik.service: Failed with result 'exit-code'.
Mar 30 18:30:23 synapse-avenue systemd[1]: matrix-traefik.service: Scheduled restart job, restart counter is at 1777.
Mar 30 18:30:23 synapse-avenue systemd[1]: Stopped Traefik (matrix-traefik).
Mar 30 18:30:23 synapse-avenue systemd[1]: Starting Traefik (matrix-traefik)...
Mar 30 18:30:24 synapse-avenue matrix-traefik[271921]: 2c521dd7c60ee481943eb4235757f020b4c7840cb316ec9a10374b2d7adb4515
Mar 30 18:30:24 synapse-avenue systemd[1]: Started Traefik (matrix-traefik).
Mar 30 18:30:24 synapse-avenue matrix-traefik[271933]: Error response from daemon: driver failed programming external connectivity on endpoint matrix-traefik (2cabc67bfdf14556207a53f1a5990a81be00c17eaaa9d7ec81768d190ada94c5): Bind for 0.0.0.0:8448 failed: port is already allocated
Mar 30 18:30:24 synapse-avenue systemd[1]: matrix-traefik.service: Main process exited, code=exited, status=1/FAILURE
Mar 30 18:30:24 synapse-avenue systemd[1]: matrix-traefik.service: Failed with result 'exit-code'.
journalctl -fu matrix-container-socket-proxy.service
-- Logs begin at Fri 2023-03-17 08:25:14 UTC. --
Mar 30 15:51:39 synapse-avenue systemd[1]: matrix-container-socket-proxy.service: Main process exited, code=exited, status=137/n/a
Mar 30 15:51:39 synapse-avenue systemd[1]: matrix-container-socket-proxy.service: Failed with result 'exit-code'.
Mar 30 15:51:39 synapse-avenue systemd[1]: Stopped Container Socket Proxy (matrix-container-socket-proxy).
Mar 30 18:24:32 synapse-avenue systemd[1]: Starting Container Socket Proxy (matrix-container-socket-proxy)...
Mar 30 18:24:34 synapse-avenue matrix-container-socket-proxy[269618]: e118ffac4824afe0e1aaeaca1e25c947c163cd78105a5cbac241fffb37c00b13
Mar 30 18:24:34 synapse-avenue systemd[1]: Started Container Socket Proxy (matrix-container-socket-proxy).
Mar 30 18:24:38 synapse-avenue matrix-container-socket-proxy[269625]: [WARNING] 088/182438 (1) : Can't open server state file '/var/lib/haproxy/server-state': No such file or directory
Mar 30 18:24:38 synapse-avenue matrix-container-socket-proxy[269625]: [NOTICE] 088/182438 (1) : New worker #1 (7) forked
Mar 30 18:24:38 synapse-avenue matrix-container-socket-proxy[269625]: Proxy dockerbackend started.
Mar 30 18:24:38 synapse-avenue matrix-container-socket-proxy[269625]: Proxy dockerfrontend started.
Failure of `just setup-all`
TASK [galaxy/com.devture.ansible.role.systemd_service_manager : Fail if service isn't detected to be running] ***
failed: [matrix.earthchat.online] (item=matrix-traefik.service) => changed=false
  ansible_loop_var: item
  item: matrix-traefik.service
  msg: matrix-traefik.service was not detected to be running. It's possible that there's a configuration problem or another service on your server interferes with it (uses the same ports, etc.). Try running `systemctl status matrix-traefik.service` and `journalctl -fu matrix-traefik.service` on the server to investigate. If you're on a slow or overloaded server, it may be that services take a longer time to start and that this error is a false-positive. You can consider raising the value of the `devture_systemd_service_manager_up_verification_delay_seconds` variable. See `/redacted/matrix-docker-ansible-deploy/roles/galaxy/com.devture.ansible.role.systemd_service_manager/defaults/main.yml` for more details about that.

davidisaaclee avatar Mar 30 '23 16:03 davidisaaclee

I could not figure out what the issue was, but migrating to a new instance by following https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/maintenance-migrating.md got me back up and running :(

davidisaaclee avatar Apr 03 '23 21:04 davidisaaclee

Had the same issue, for me traefik service would not start due to port being allocated already. @davidisaaclee's comment pretty much summed up all the symptoms. I was running on a 1gb instance(t3a.micro ec2) and had failed setup-all's as well which resulted is such broken state. That was a new install, so I didn't need to preserve any configs and after docker system prune -a, removing /matrix/ and adding a 2gb swap file the install went without a hitch.

artu-ole avatar Jun 15 '23 11:06 artu-ole

I had the very same issue. - But I had some legacy configs in following folder on the host: /matrix/nginx-proxy/conf.d/ - first I deleted there everything. After this I noticed some network in docker network ls seemed odd. After running ansible-playbook -i inventory/hosts setup.yml --tags=stop - I ran on the hostsystem docker network prune Now everything works.

luschmar avatar Jul 03 '23 20:07 luschmar

Had a similar issue to this today, everything seemed to be working but the built in traefik container kept crashing because of the error

Error response from daemon: driver failed programming external connectivity on endpoint matrix-traefik (---): Bind for 0.0.0.0:8448 failed: port is already allocated

Turns out docker-proxy was using that port for some reason, restarting docker as a whole fixed the issue. Just to be safe though I did a docker system prune --all --volume to delete all the containers and networks and start over.

gouthamravee avatar Jul 31 '23 14:07 gouthamravee