PILOS icon indicating copy to clipboard operation
PILOS copied to clipboard

After reboot: status to server ok, but no connection

Open jdepi opened this issue 7 months ago • 6 comments

I reboot my PILOS server every friday. It looks like PILOS does not automatically reconnect to the local server. The status shows ok, but the connection shows a failure. After testing the connection manually it is established.

It would be great, if PILOS could try to reconnect periodically every 5 minutes, in case of a connection is lost.

To Reproduce Steps to reproduce the behavior:

Reboot PILOS and check the connection in menu server.

The following screenshot shows the expected behaviour.

Image

jdepi avatar May 13 '25 14:05 jdepi

Hi @jdepi,

PILOS checks the connection every minute using the cron and horizon container. Please ensure these containers are running. However I'm not aware of any reasons why the connection should be marked as faulty after a restart.

I'm currently creating a PR to fix and issues that can prevent the cron and horizon container from restarting, maybe related.

samuelwei avatar May 13 '25 19:05 samuelwei

Is there any specific log I can send when it happens next time?

jdepi avatar May 14 '25 07:05 jdepi

First check if all containers are up and running (docker compose ps) To do a full restart that should resolve all issues (docker compose down && docker compose up)

You can also check the cron containers logs (docker compose logs cron). There should be a log entry every minute, like in the screenshot below

Image

samuelwei avatar May 14 '25 07:05 samuelwei

docker compose ps showed the following:

NAME            IMAGE                  COMMAND                  SERVICE   CREATED      STATUS                  PORTS
pilos-app-1     pilos/pilos:4          "entrypoint"             app       8 days ago   Up 12 hours (healthy)   443/tcp, 9000/tcp, 127.0.0.1:5000->80/tcp
pilos-db-1      mariadb:11             "docker-entrypoint.s…"   db        8 days ago   Up 12 hours (healthy)   3306/tcp
pilos-redis-1   redis:7.2-alpine3.18   "docker-entrypoint.s…"   redis     8 days ago   Up 12 hours (healthy)   6379/tcp

After restart:

NAME              IMAGE                  COMMAND                  SERVICE   CREATED          STATUS                    PORTS
pilos-app-1       pilos/pilos:4          "entrypoint"             app       23 seconds ago   Up 17 seconds (healthy)   443/tcp, 9000/tcp, 127.0.0.1:5000->80/tcp
pilos-cron-1      pilos/pilos:4          "pilos-cli run:cron"     cron      23 seconds ago   Up 6 seconds              80/tcp, 443/tcp, 9000/tcp
pilos-db-1        mariadb:11             "docker-entrypoint.s…"   db        23 seconds ago   Up 22 seconds (healthy)   3306/tcp
pilos-horizon-1   pilos/pilos:4          "pilos-cli run:horiz…"   horizon   23 seconds ago   Up 6 seconds              80/tcp, 443/tcp, 9000/tcp
pilos-redis-1     redis:7.2-alpine3.18   "docker-entrypoint.s…"   redis     23 seconds ago   Up 22 seconds (healthy)   6379/tcp

jdepi avatar May 14 '25 07:05 jdepi

Samuel, should I create a daily cron for restarting PILOS?

jdepi avatar May 14 '25 09:05 jdepi

I would not recommend restarting PILOS; I see no reason to do so. We keep it running all the time and only stop it for updates.

If you restart your BBB server, it takes a few minutes for PILOS to recognise that the server is stable again, as it was offline for a few minutes during a system restart.

I would not recommend running PILOS on the same server as BBB as this makes updating BBB very difficult. Some BBB version changes require you to wipe the server, and you don't want to lose all your persistent data like rooms, users, etc. BBB updates also tend to change web server configuration files, which can cause issues with PILOS.

samuelwei avatar May 14 '25 11:05 samuelwei

I had the issue again, yesterday. I used pilos itself to test the connection to the server by manually clicking on "test connection". This solved the issue. How can I enforce an automatic repair?

jdepi avatar Jun 12 '25 08:06 jdepi

It will repair itself over time. The sheduler checks the server every minute. Depending on your settings the server will become healthy after 3min (default) Please note: The issue with failed container restarts have been fixed, however the PR in not released yet. The PILOS release containing this PR is on the way.

samuelwei avatar Jun 12 '25 10:06 samuelwei