deimos
deimos copied to clipboard
Container flaps between 'Staging' and 'Running' => can't download a package
Hi,
starting a simple container using Marathon/Deimos fails because for some reason it is fails on not being able to fetch a .jar file hosted on S3. The docker image is download correctly by a slave from the public docker repo, and can be run manually with no problems. The app inside the container is a simple 'hello-world' type java app.
Details:
mesos: 0.19.1 deimos: 0.4.0 marathon: 0.6.0-1.0 ubuntu: 14.04 trusty
docker image: tnolet/hello1 Dockerfile:
FROM ubuntu:latest MAINTAINER Tim Nolet RUN apt-get update -y RUN apt-get install -y --no-install-recommends openjdk-7-jre ENV JAVA_HOME /usr/lib/jvm/java-7-openjdk-amd64 RUN apt-get install -y curl RUN curl -sf -O https://s3-eu-west-1.amazonaws.com/deploy.magnetic.io/snapshots/dropwizard-0.0.1-SNAPSHOT.jar RUN curl -sf -O https://s3-eu-west-1.amazonaws.com/deploy.magnetic.io/snapshots/hello-world.yml EXPOSE 8080 EXPOSE 8081 ENV SERVICE hello:0.0.1:8080:8081 CMD java -jar dropwizard-0.0.1-SNAPSHOT.jar server hello-world.yml
task file:
{ "container": { "image": "docker:///tnolet/hello1", "options" : [] }, "id": "hello1", "instances": "1", "cpus": ".5", "mem": "512", "uris": [], "cmd": "" }
Error in stderr in mesos gui:
Error: Unable to access jarfile dropwizard-0.0.1-SNAPSHOT.jar
output from mesos.slave-INFO on slave:
I0731 13:09:21.673143 8814 slave.cpp:1664] Got registration for executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:21.673703 8814 slave.cpp:1783] Flushing queued task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab for executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framewor k 20140731-110416-606019500-5050-1090-0000 I0731 13:09:21.695307 8814 slave.cpp:2018] Handling status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab o f framework 20140731-110416-606019500-5050-1090-0000 from executor(1)@172.31.31.38:49678 I0731 13:09:21.695582 8814 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a08d -0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:21.695897 8814 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a0 8d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to [email protected]:5050 I0731 13:09:21.696854 8815 slave.cpp:2145] Sending acknowledgement for status update TASK_RUNNING (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4- a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to executor(1)@172.31.31.38:49678 I0731 13:09:21.702631 8812 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 4e704272-eecd-4205-819c-a2eb63048c18) for task hello1.e0be7ca8-18b3-11e4-a 08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:21.859962 8816 slave.cpp:2355] Monitoring executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework '20140731-110416-606019500-5050-1090-0000' in container 'e4c7ca90-0dff-4492-b3c4-e6c7569f1eeb' I0731 13:09:22.687067 8813 slave.cpp:2018] Handling status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 from executor(1)@172.31.31.38:49678 I0731 13:09:22.698246 8811 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d- 0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:22.699434 8811 status_update_manager.cpp:373] Forwarding status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to [email protected]:5050 I0731 13:09:22.700186 8811 slave.cpp:2145] Sending acknowledgement for status update TASK_FAILED (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 to executor(1)@172.31.31.38:49678 I0731 13:09:22.709666 8815 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 2d405edf-32a0-493e-aea0-d2d8d4cc1f9c) for task hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab of framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:23.060930 8814 slave.cpp:933] Got assigned task hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab for framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:23.061293 8814 slave.cpp:1043] Launching task hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab for framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:23.063863 8815 external_containerizer.cpp:433] Launching container 'cfb86a26-2821-49ce-95a0-3e4d0dfd8657' I0731 13:09:23.080337 8814 slave.cpp:1153] Queuing task 'hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab' for executor hello1.e3b9e259-18b3-11e4-a08d-0a4559673eab of framework '20140731-110416-606019500-5050-1090-0000 E0731 13:09:23.859387 8811 slave.cpp:2397] Termination of executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework '20140731-110416-606019500-5050-1090-0000' failed: External containerizer failed (status: 1) I0731 13:09:23.859632 8811 slave.cpp:2552] Cleaning up executor 'hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' of framework 20140731-110416-606019500-5050-1090-0000 I0731 13:09:23.860239 8811 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20140731-110416-606019500-5050-1090-2/frameworks/20140731-110416-606019500-5050-1090-0000/executors/hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab/runs/e4c7ca90-0dff-4492-b3c4-e6c7569f1eeb' for gc 6.99999004926815days in the future I0731 13:09:23.860345 8811 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20140731-110416-606019500-5050-1090-2/frameworks/20140731-110416-606019500-5050-1090-0000/executors/hello1.e0be7ca8-18b3-11e4-a08d-0a4559673eab' for gc 6.99999004838222days in the future I0731 13:09:23.871316 8816 external_containerizer.cpp:1040] Killed the following process tree/s: [ ]
Again. running the following command on the slave manually starts up the container with no problems:
sudo docker run -d -P tnolet/hello1
I found the cause for this behaviour. This flapping happens when artifacts or executables inside docker containers are not referenced by their full path name. Because deimos adds the -w /tmp/mesos-sandbox
switch for the working directory in Docker, all paths are off...
Not sure if this is a bug or just something people should be aware of.
This is similar to https://github.com/mesosphere/deimos/issues/49
I'm just not sure what the right thing to do is. Deimos puts URLs from the Mesos task in a directory which it mounts at /tmp/mesos-sandbox
so tasks can find the downloaded contents. It seems reasonable to set the working directory to that directory, too, so that frameworks which are unaware of Docker will still find the URLs they expect.
There is a patch under #49 to acknowledge the WORKDIR
directive but I do wonder if there is a better policy in general.
Having Deimos dump the URLs in the "right place" could perhaps be accomplished by:
- Downloading the files
- Building a new image that inherits from the original one, using
RUN
to copy the downloaded files to.
- Running the new image
Hopefully ENTRYPOINT
and CMD
and all that would be preserved in the new image.
I see the problem. I guess if everyone is fully aware that this is happening, there is not a big problem. Making your paths and urls fully qualified isn't always a nice way of handling things, but there are ways around it and in the end it's not a biggie.