dojo icon indicating copy to clipboard operation
dojo copied to clipboard

Infrequent Dojo crash when using docker-compose driver

Open tomzo opened this issue 3 years ago • 3 comments

Dojo process crashes from time to time (around 1 in 100) when ran with docker-compose. This causes the some of the containers created by docker-compose to stay running on the CI agent. (Because after the crash, there is nothing to clean them up).

It seems that there are 2 things wrong here:

  1. docker-compose ps is called before before _app_1 was already created. I suppose the ps is part of the background monitoring process to check if containers are running. But perhaps it's kicking-in too early...
  2. Exit status: 1 from docker-compose is causing the Dojo process to crash. That should just never happen.

Logs from CI:

2020/12/01 15:15:53 [ 1]  INFO: (main.main) Dojo version 0.7.0
2020/12/01 15:15:53 [ 4]  INFO: (main.DockerComposeDriver.HandleRun) docker-compose run command will be:
 docker-compose -f docker-compose-dtest.yml -f docker-compose-dtest.yml.dojo -p dojo-******** run --rm -T default "./tasks _test_docker"
Creating network "dojo-***********_default" with the default driver
Pulling app (*************.amazonaws.com/**********)...
2bedf8f: Pulling from *****/app
Creating ***********_db_1 ... 
Creating ***********_db_1 ... done
Creating ****************************_app_1 ... 
panic: Unexpected exit status:
Command: docker-compose -f docker-compose-dtest.yml -f docker-compose-dtest.yml.dojo -p dojo-********* ps
  Exit status: 1
  StdOut: <empty string>
  StdErr: No such container: ded74698ff6c7539c16d506ac0d05a8ccc1884e8da4a3030f50c8b68d2de63a2


goroutine 20 [running]:
main.DockerComposeDriver.getDCContainersNames(0x535e80, 0xc00006e0c0, 0x5367e0, 0xc000062300, 0xc000062300, 0xc0000601e0, 0x510825, 0x3, 0x7ffd4fef2a4a, 0xe, ...)
	/dojo/work/src/dojo/docker_compose_driver.go:601 +0x7f3
main.DockerComposeDriver.waitForContainersToBeRunning(0x535e80, 0xc00006e0c0, 0x5367e0, 0xc000062300, 0xc000062300, 0xc0000601e0, 0x510825, 0x3, 0x7ffd4fef2a4a, 0xe, ...)
	/dojo/work/src/dojo/docker_compose_driver.go:237 +0x170
main.DockerComposeDriver.watchContainers(0x535e80, 0xc00006e0c0, 0x5367e0, 0xc000062300, 0xc000062300, 0xc0000601e0, 0x510825, 0x3, 0x7ffd4fef2a4a, 0xe, ...)
	/dojo/work/src/dojo/docker_compose_driver.go:270 +0x1d6
created by main.DockerComposeDriver.HandleRun
	/dojo/work/src/dojo/docker_compose_driver.go:390 +0x5ec
Creating  ****************************_app_1  ... done

tomzo avatar Dec 02 '20 15:12 tomzo

Workaround released in Dojo 0.10.3, however I couldn't reproduce this error.

xmik avatar Dec 07 '20 09:12 xmik

Reproduced on CircleCI here: https://app.circleci.com/pipelines/github/kudulab/dojo/51/workflows/43161f70-6d0f-40c0-9a63-59e17e21b965/jobs/175 using commit https://github.com/kudulab/dojo/commit/5a344fec307a09190138b7f0c275e8faac42469e

Log messages:

DEBUG: (main.DockerComposeDriver.HandleRun) Exit status from run command: 0\n
2024/02/04 07:20:12 [ 5] DEBUG: (main.DockerComposeDriver.HandleRun) Collecting information from non default containers\n
2024/02/04 07:20:12 [ 8] ERROR: (main.DockerComposeDriver.getDCContainersNames) \x1b[31mUnexpected exit status:\n
Command: docker-compose -f ./test/test-files/itest-dc.yaml -f ./test/test-files/itest-dc.yaml.dojo -p testdojorunid ps --format json --all\n
Exit status: 1\n
StdOut: <empty string>\n
StdErr: Error response from daemon: No such container: 731492b22407b5d22db460ac5daee3a2e46e24286dfd4f6916b09457018eb66b\n
\x1b[0m\n
2024/02/04 07:20:12 [ 8] DEBUG: (main.DockerComposeDriver.waitForContainersToBeRunning) Containers not yet created: testdojorunid\n
2024/02/04 07:20:12 [ 5] DEBUG: (main.DockerComposeDriver.stop) Stopping containers\n
2024/02/04 07:20:12 [ 5]  INFO: (main.DockerComposeDriver.stop) Stopping containers with command: \n
docker-compose -f ./test/test-files/itest-dc.yaml -f ./test/test-files/itest-dc.yaml.dojo -p testdojorunid stop\n
Container testdojorunid-abc-1  Stopping\n
Container testdojorunid-abc-1  Stopped\n
2024/02/04 07:20:12 [ 5] DEBUG: (main.DockerComposeDriver.stop) Exit status from stop command: 0

xmik avatar Feb 04 '24 07:02 xmik

This is not fixed in Dojo 0.12.0. It happens rarely, and the workaround implemented in Dojo 0.10.3 is still in place. The workaround was that we don't panic but rather print out a log message instead. However, this leads to flaky tests.

xmik avatar Feb 04 '24 11:02 xmik