solidtime icon indicating copy to clipboard operation
solidtime copied to clipboard

Queue and Scheduler reporting unhealthy in docker

Open thelooter opened this issue 1 year ago • 1 comments

I'm using the provided selfhosting example in the selfhosting repo.

Both Queue and Scheduler report a failing healthcheck.

I grepped both logs for lines without "level=info" to find any errors

Scheduler

docker logs 1-docker-with-database-scheduler-1 2>&1 | grep -v "level=info"

Container mode: scheduler

   INFO  Clearing cached bootstrap files.  

  cache .......................................................... 4.82ms DONE
  compiled ....................................................... 1.40ms DONE
  config ......................................................... 0.63ms DONE
  events ......................................................... 0.55ms DONE
  routes ......................................................... 0.59ms DONE
  views .......................................................... 4.68ms DONE


   INFO  Events cached successfully.  


   INFO  Configuration cached successfully.  


   INFO  Routes cached successfully.  

2024-07-01 13:05:53,926 INFO Included extra file "/etc/supervisor/supervisord.conf" during parsing
2024-07-01 13:05:53,930 INFO Set uid to user 1000 succeeded
2024-07-01 13:05:53,946 INFO RPC interface 'supervisor' initialized
2024-07-01 13:05:53,946 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-01 13:05:53,946 INFO supervisord started with pid 1
2024-07-01 13:05:54,951 INFO spawned: 'clear-scheduler-cache_00' with pid 36
2024-07-01 13:05:54,965 INFO spawned: 'scheduler_00' with pid 37
2024-07-01 13:05:55,035 INFO success: clear-scheduler-cache_00 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2024-07-01 13:05:56,037 INFO success: scheduler_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

   INFO  No mutex files were found.  

2024-07-01 13:05:59,942 INFO exited: clear-scheduler-cache_00 (exit status 0; expected)
2024-07-09 13:26:31,451 INFO reaped unknown pid 1329272 (exit status 1)
2024-07-09 13:29:04,033 INFO reaped unknown pid 1329553 (exit status 1)
2024-07-10 17:02:21,954 INFO reaped unknown pid 1518999 (exit status 1)

Queue

docker logs 1-docker-with-database-queue-1 2>&1 | grep -v "level=info"

Container mode: worker

   INFO  Clearing cached bootstrap files.  

  cache .......................................................... 4.75ms DONE
  compiled ....................................................... 1.35ms DONE
  config ......................................................... 0.67ms DONE
  events ......................................................... 0.53ms DONE
  routes ......................................................... 0.60ms DONE
  views .......................................................... 4.63ms DONE


   INFO  Events cached successfully.  


   INFO  Configuration cached successfully.  


   INFO  Routes cached successfully.  

2024-07-01 13:05:53,895 INFO Included extra file "/etc/supervisor/supervisord.conf" during parsing
2024-07-01 13:05:53,896 INFO Set uid to user 1000 succeeded
2024-07-01 13:05:53,907 INFO RPC interface 'supervisor' initialized
2024-07-01 13:05:53,908 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-01 13:05:53,908 INFO supervisord started with pid 1
2024-07-01 13:05:54,913 INFO spawned: 'worker_00' with pid 36
2024-07-01 13:05:54,915 INFO reaped unknown pid 31 (exit status 1)
2024-07-01 13:05:55,918 INFO success: worker_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 13:26:30,905 INFO reaped unknown pid 1313517 (exit status 1)
2024-07-10 17:02:22,961 INFO reaped unknown pid 1501090 (exit status 1)
2024-07-10 17:04:55,188 INFO reaped unknown pid 1501369 (exit status 1)

The only error i can see is that the "Server 'unix_http_server' running without any HTTP authentication checking", which I think is just related to running without https.

If any additional Method is required, I can provide it

thelooter avatar Jul 10 '24 17:07 thelooter

Could you please provide the following information:

  • Which image tag are you using? If you are not using a version-based image tag, please try to pull the newest image and try again.
  • Did change anything in the docker-compose file? For example, add a health check?
  • Could you send me the full log (if sensitive in Discord): docker compose logs > log.txt

korridor avatar Jul 15 '24 10:07 korridor

I experience the same behaviour. App and database containers are healthy, queue and scheduler containers are unhealthy. The docker-compose has not been modified to add a health check, latest Solidtime image is used as well.

RKLBusinessDevelopment avatar Sep 09 '24 09:09 RKLBusinessDevelopment

I'm pretty sure your issue is not directly related to the issue of @thelooter. I already debugged their problem via Discord.

We recently removed the health check from the docker image. We instead added the health check to the docker-compose.yml file, but only added it to the app container and not to the queue and scheduler container, since we haven't had time to find a suitable solution for that. So for now, you can ignore queue and scheduler reporting unhealthy.

I'll write here when we updated the example again and added more health checks.

korridor avatar Sep 09 '24 13:09 korridor

I am not too concerned about this issue, as the application runs fine, so I am already happily ignoring the "unhealthy" status of both containers. Thanks for the information, I'll just wait for an updated version then.

RKLBusinessDevelopment avatar Sep 09 '24 14:09 RKLBusinessDevelopment

@RKLBusinessDevelopment Just pushed a commit to the example repository that adds health checks to all containers. (Commit) This should give all your containers a healthy status if they are working correctly.

I'm closing the issue now, if you still have problems with this after pulling these changes, please reopen.

korridor avatar Sep 13 '24 15:09 korridor