nvidia-docker when I pointed device=3 of gpu, but it only used the first gpu(device=1)

when I pointed device=3 of gpu, but it only used the first gpu(device=1)

Open PingYufeng opened this issue 3 years ago • 1 comments

Host Ubuntu 18.04 Docker 20.10.1

When I run tensorflow/serving:2.6.0-gpu with all gpus, but it only use one.

sudo docker run -p 8500:8500 -p 8501:8501 --gpus 4 --mount type=bind,source=/home/building,target=/models/building -e MODEL_NAME=building -t tensorflow/serving:2.6.0-gpu --enable_batching=true --batching_parameters_file=/models/building/batching_parameters.txt &

What's more, when I pointed device=3 of gpu, but it only used the first gpu(device=1).

docker run --gpus '"device=3"'

Dec 12 '21 08:12 PingYufeng

Hi @PingYufeng could you please provide the output of nvidia-smi on the host as well as in the container for the different situations that you are describing -- especially in the case where device=3 is selected but only device 1 is available in the container.

Looking at some TF Serving resources on the web, it seems as if it is specifically targeted at a single GPU use case (see for example https://stephenweixu.medium.com/serving-multiple-ml-models-on-multiple-gpus-with-tensorflow-serving-fe2ade7aa16b). This seems to indicate that the "first" GPU visible to the container will most likely always be used.

The question of the behaviour when you select a specific GPU is still valid.

Jan 10 '22 14:01 elezar

nvidia-docker nvidia-docker copied to clipboard

when I pointed device=3 of gpu, but it only used the first gpu(device=1)

nvidia-docker
nvidia-docker copied to clipboard