mrjob icon indicating copy to clipboard operation
mrjob copied to clipboard

Documentation for wrapping docker with task_python_bin

Open max-nova opened this issue 6 years ago • 2 comments

I've seen references throughout some mrjob issues / codebase about running tasks inside docker (e.g. https://github.com/Yelp/mrjob/issues/1394). But I can't seem to find any real documentation about how one might do this. Does mrjob need to be installed within the docker container? Does the docker container need to share directories somehow?

I'd be happy to write the docs for this once I understand what's going on. Could hop on a call or meet up for lunch in SF to discuss?

max-nova avatar Nov 27 '18 03:11 max-nova

I didn't write the Docker wrapper that Yelp uses, so apologies if this is somewhat vague!

The basic idea is to set python_bin to something that runs docker run. For example, --python-bin 'docker run <image> python'.

However, you'll also need your current working directory (which contains, among other things, the mrjob script itself), so you really want your python_bin to look more like:

docker run -v $PWD:/working_dir -w /working_dir <image> python

However, there are a couple of problems. First, you can't just mount a copy of the working directory inside Hadoop because Hadoop uses symlinks. Second, running in attached mode means piping all your input and output through dockerd, which is rather slow.

Both of these can be solved by making a wrapper script (docker_wrapper.sh). The script makes a copy of your working directory (e.g. using rsync) sets up fifos to pipe stdin, stdout, and stderr in without involving dockerd, runs your job in detached mode, and then waits for it to finish. Then you set python_bin to be something like sh docker_wrapper.sh (on EMR, you may have to use bootstrap to place docker_wrapper.sh somewhere Spark can find it).

The final command line looks something like:

cat $COPY_OF_PWD/stdout_fifo &
cat $COPY_OF_PWD/stderr_fifo >&2 &
container_id=`docker run -d -v $COPY_OF_PWD:/working_dir -w /working_dir $DOCKER_IMAGE sh -c "$python $* < stdin_fifo > stdout_fifo 2> stderr_fifo"`
cat > $COPY_OF_PWD/stdin_fifo

(In this case, we made fifos in our copy of the working dir just to keep things simple.)

And then you need to monitor the docker image for completion using docker inspect. If the job fails, you'll need to do something intelligent with the relevant logs (e.g. dump them to stderr) and exit with the same (nonzero) return code.

Anyways, as you can see, doing this correctly gets rather elaborate. Ideally, this should be a feature built into mrjob.

coyotemarin avatar Dec 14 '18 21:12 coyotemarin

Thanks for your very helpful in-depth response! This pointed me in the right direction.

I was having some trouble figuring out the mrjob working directory code, so I ended up just pip installing my code both outside and inside docker and then overriding _execute() in the runners to call it as a module (-m flag). I also start the docker container running in my bootstrap script and then call the docker exec command rather than docker run to avoid the perf hit of spinning up a new docker container every time.

Anyways, here's the script that ended up working for me (below). Hopefully this is useful to others on the interwebs trying to figure this out too.

Would also be happy to work on a more general solution for inclusion in mrjob. Don't think I'm quite at a sufficient level of understanding of all the moving parts yet though (especially the working directory stuff). There's also some fancy footwork I do in my library with dynamic imports with importlib so that the host context doesn't have to have all the dependencies that the docker context has. Not sure how you'd want to handle that in mrjob.

#!/usr/bin/env bash

if [ -z "$DOCKER_CONTAINER" ]
then
    (>&2 echo "DOCKER_CONTAINER env var is undefined")
    exit 1
fi

HOST_MNT_DIR=/mnt/docker-mnt
CONTAINER_MNT_DIR=/mnt/vol

# make a temporary directory
HOST_FIFO_DIR=`mktemp -d -p $HOST_MNT_DIR` || exit 1
CONTAINER_FIFO_DIR=$CONTAINER_MNT_DIR/`basename $HOST_FIFO_DIR`

# create fifos
mkfifo $HOST_FIFO_DIR/stdin_fifo
mkfifo $HOST_FIFO_DIR/stderr_fifo
mkfifo $HOST_FIFO_DIR/stdout_fifo

# cat docker stdout/stderr to begin watching for output
cat $HOST_FIFO_DIR/stdout_fifo &
cat $HOST_FIFO_DIR/stderr_fifo >&2 &

# execute command in docker container, passing stdin/stderr/stdout thru fifo
sudo docker container exec -d -w $CONTAINER_FIFO_DIR $DOCKER_CONTAINER sh -c "python $* < stdin_fifo > stdout_fifo 2> stderr_fifo"
cmd_pid=$!

# feed docker-wrapper.sh stdin to the stdin_fifo
cat <&0 > $HOST_FIFO_DIR/stdin_fifo

# wait for command to finish
wait $cmd_pid
cmd_status=$?

# cleanup and exit
rm -rf $HOST_FIFO_DIR
exit $cmd_status

Also, wanted to note that I originally hadn't used FIFOs and was just setting task_python_bin to docker container exec -i {container} python which actually seemed to work fine for the most part, especially with -r local. But on EMR, it would run through a lot of mappers/reducers and then seem to hang. I spent a few days tracking this down and it turned out that it was always mappers or reducers that were receiving no input. So I suspect that docker was buffering stdin and wasn't flushing the buffer when it wasn't getting input? So the mappers/reducers would just hang until they were killed.

max-nova avatar Dec 19 '18 15:12 max-nova