matrix-docker-ansible-deploy
matrix-docker-ansible-deploy copied to clipboard
Dimension starts but doesn't run successfully
Describe the bug
Enabling dimension in the playbook succeeds with no errors, and I am able to load the test page at dimension.
[MatrixHttpClient (REQ-183)] GET http://matrix-nginx-proxy:12080/_matrix/client/r0/sync May 23 09:59:09 <hostname> matrix-dimension[1915874]: Mon, 23 May 2022 13:59:09 GMT [DEBUG] [MatrixClientLite] Received sync. Next token: <TOKEN>
May 23 09:59:09 <hostname> matrix-dimension[1915874]: Mon, 23 May 2022 13:59:09 GMT [DEBUG] [MatrixClientLite] Performing sync with token <TOKEN>
Every time I start using the standard ansible script to start up dimension, the following error occurs midway through startup:
May 23 09:33:02 <hostname> matrix-dimension[1914168]: Mon, 23 May 2022 13:33:02 GMT [DEBUG] [MatrixHttpClient (REQ-1)] GET http://matrix-nginx-proxy:12080/_matrix/client/r0/account/whoami
May 23 09:33:02 <hostname> matrix-dimension[1914168]: Mon, 23 May 2022 13:33:02 GMT [ERROR] [MatrixHttpClient (REQ-1)] Error: connect ECONNREFUSED <IP Address>:12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]: at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
May 23 09:33:02 <hostname> matrix-dimension[1914168]: errno: -111, May 23 09:33:02 <hostname> matrix-dimension[1914168]: code: 'ECONNREFUSED', May 23 09:33:02 <hostname> matrix-dimension[1914168]: syscall: 'connect',
May 23 09:33:02 <hostname> matrix-dimension[1914168]: address: '<IP Address>',
May 23 09:33:02 <hostname> matrix-dimension[1914168]: port: 12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]: }
May 23 09:33:02 <hostname> matrix-dimension[1914168]: Error: connect ECONNREFUSED <IP Address>:12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]: at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
May 23 09:33:02 <hostname> matrix-dimension[1914168]: errno: -111,
May 23 09:33:02 <hostname> matrix-dimension[1914168]: code: 'ECONNREFUSED',
May 23 09:33:02 <hostname> matrix-dimension[1914168]: syscall: 'connect',
May 23 09:33:02 <hostname> matrix-dimension[1914168]: address: '<IP Address>',
May 23 09:33:02 <hostname> matrix-dimension[1914168]: port: 12080
May 23 09:33:02 <hostname> matrix-dimension[1914168]: }
Running the t2bot connection test widget fails on the homeserver step as well.
Error contacting homeserver. This usually means your federation setup is incorrect, or your homeserver is offline. Consult your homeserver's documentation for how to set up federation.
To Reproduce
My vars.yml
file looks like this:
# The bare domain name which represents your Matrix identity.
# Matrix user ids for your server will be of the form (`@user:<matrix-domain>`).
#
# Note: this playbook does not touch the server referenced here.
# Installation happens on another server ("matrix.<matrix-domain>").
#
# If you've deployed using the wrong domain, you'll have to run the Uninstalling step,
# because you can't change the Domain after deployment.
#
# Example value: example.com
matrix_domain: <domain>
# The Matrix homeserver software to install.
# See `roles/matrix-base/defaults/main.yml` for valid options.
matrix_homeserver_implementation: synapse
# A secret used as a base, for generating various other secrets.
# You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`).
matrix_homeserver_generic_secret_key: '<secret>'
# This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains.
#
# In case SSL renewal fails at some point, you'll also get an email notification there.
#
# If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt),
# you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`).
#
# Example value: [email protected]
matrix_ssl_lets_encrypt_support_email: '<email>'
#########
##JITSI##
#########
# A Postgres password to use for the superuser Postgres user (called `matrix` by default).
#
# The playbook creates additional Postgres users and databases (one for each enabled service)
# using this superuser account.
matrix_postgres_connection_password: '<secret>'
matrix_jitsi_enabled: true
# Run `bash inventory/scripts/jitsi-generate-passwords.sh` to generate these passwords,
# or define your own strong passwords manually.
matrix_jitsi_jicofo_auth_password: <secret>
matrix_jitsi_jvb_auth_password: <secret>
matrix_jitsi_jibri_recorder_password: <secret>
matrix_jitsi_jibri_xmpp_password: <secret>
matrix_jitsi_jvb_container_extra_arguments:
- '--env "DOCKER_HOST_ADDRESS=<local IP>"'
matrix_jitsi_web_custom_config_extension: |
config.enableLayerSuspension = true;
config.disableAudioLevels = true;
// Limit the number of video feeds forwarded to each client
config.channelLastN = 4;
############
##GRAPHANA##
############
matrix_prometheus_enabled: true
matrix_prometheus_node_exporter_enabled: true
matrix_grafana_enabled: true
matrix_grafana_anonymous_access: false
# This has no relation to your Matrix user id. It can be any username you'd like.
# Changing the username subsequently won't work.
matrix_grafana_default_admin_user: <admin user>
# Changing the password subsequently won't work.
matrix_grafana_default_admin_password: <password>
#########
# NGINX #
#########
matrix_nginx_proxy_access_log_enabled: false
#############
# DIMENSION #
#############
matrix_dimension_enabled: true
matrix_dimension_admins:
- "@dimension:matrix.<domain>"
- "@<admin user>:matrix.<domain>"
matrix_dimension_access_token: "<dimension's access token>"
############
# POSTGRES #
############
matrix_postgres_process_extra_arguments: [
"-c 'max_connections=200'",
"-c 'shared_buffers=512MB'",
"-c 'effective_cache_size=1536MB'",
"-c 'maintenance_work_mem=128MB'",
"-c 'checkpoint_completion_target=0.9'",
"-c 'wal_buffers=16MB'",
"-c 'default_statistics_target=100'",
"-c 'random_page_cost=1.1'",
"-c 'effective_io_concurrency=200'",
"-c 'work_mem=2621kB'",
"-c 'min_wal_size=1GB'",
"-c 'max_wal_size=4GB'",
"-c 'max_worker_processes=2'",
"-c 'max_parallel_workers_per_gather=1'",
"-c 'max_parallel_workers=2'",
"-c 'max_parallel_maintenance_workers=1'",
]
#########
# OTHER #
#########
matrix_synapse_admin_enabled: true
matrix_registration_enabled: true
# Generate a strong secret using: `pwgen -s 64 1`.
matrix_registration_admin_secret: "<admin secret>"
matrix_synapse_configuration_extension_yaml: |
retention:
enabled: true
purge_jobs:
- longest_max_lifetime: 3d
shortest_max_lifetime: 1d
interval: 4h
default_policy:
min_lifetime: 1d
max_lifetime: 60h
allowed_lifetime_max: 3d
Expected behavior
- Homeserver test should succeed
- Should be able to successfully use widgets with the locally hosted dimension Matrix Server:
- OS: Debian 11
- Architecture: amd64
Additional context
running the following command to start the playbook:
sudo ansible-playbook -i </path/to/playbook>/inventory/hosts setup.yml --tags=start -e ansible_python_interpreter=/usr/bin/python3
same here
I have had the same issue for a few months now.
All services do start successfully, but the ansible command ansible-playbook -i inventory/hosts setup.yml --tags=start
fails with the above error.
Debugging a bit, it seems that Dimension does start up, but is faster than matrix-nginx-proxy
and begins issuing requests when matrix-nginx-proxy
is not ready yet to receive them.
This causes the container to fail and the service gets restarted after 30 seconds -- which in turn will cause the playbook to fail, since matrix_common_after_systemd_service_start_wait_for_timeout_seconds
is configured at a default 15 seconds.
To verify my suspicion, I set matrix_common_after_systemd_service_start_wait_for_timeout_seconds
to 45 seconds and ran the command ansible-playbook -i inventory/hosts setup.yml --tags=start
again.
This stopped the error from occurring.
I am not certain how this could be addressed, but wanted to give some more info on this.
We could introduce an intentional delay ot matrix-dimension.service
(the systemd service starting Dimension).
We could also open an issue in the Dimension repository and ask to change Dimension so that it doesn't hard-fail when the homeserver is temporarily unavailable. Not sure how maintained Dimension is nowadays (I suspect it's not), so we'll probably be out of luck reporting issues there.
I also found that setting the (now called) variable devture_systemd_service_manager_up_verification_delay_seconds: 60
to 60 seconds solves the error on my slower machine while 45 was still to slow.
However on my production host, which is way beefier, the default 15s seem to be enough