CenterTrack icon indicating copy to clipboard operation
CenterTrack copied to clipboard

Add a Dockerfile including DCNv2 GPU compilation

Open Keiku opened this issue 3 years ago • 9 comments

Everyone seems to be having trouble with issue about GPU compilation of DCNv2, so I added a Dockerfile that works correctly.

I have confirmed the operation in the following environment.

⋊> ~ cat /etc/os-release | grep PRETTY_NAME
PRETTY_NAME="Ubuntu 18.04.2 LTS"
⋊> ~ docker --version
Docker version 19.03.5, build 633a0ea838
⋊> ~ docker-compose -v
docker-compose version 1.25.3, build unknown
⋊> ~ docker info | grep -i runtime
WARNING: No swap limit support
 Runtimes: nvidia runc
 Default Runtime: nvidia
⋊> ~ 

If you want to use cuda when building docker container, you need to set the following daemon.json.

⋊> ~ cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

After preparing the above environment, execute Docker build with the following command.

docker-compose up -d dev

You cannot build with cuda unless you add the following to docker-compose.yaml.

    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all

For Docker 19.03 and later versions, use the --gpus all option. Execute the docker run command as follows.

docker run --gpus all --ipc=host --rm -it \
         -v /home/keiichi.kuroyanagi/datasets/:/CenterTrack/data/ \
         -v /home/keiichi.kuroyanagi/pretrained_models/:/CenterTrack/models/ \
         centertrack_dev

Keiku avatar Dec 23 '20 06:12 Keiku

Thanks for your dockerfile!!!

I had some problem about "qt.qpa.xcb: could not connect to display"

Can you help me?


root@a2118cd70918:/CenterTrack/src# python demo.py tracking,ddd --load_model ../models/nuScenes_3Dtracking.pth --dataset nuscenes --pre_hm --track_thresh 0.1 --demo ../videos/nuscenes_mini.mp4 --test_focal_length 633 /usr/local/lib/python3.6/dist-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead. FutureWarning) Running tracking Using tracking threshold for out threshold! 0.1 Fix size testing. training chunk_sizes: [32] input h w: 448 800 heads {'hm': 10, 'reg': 2, 'wh': 2, 'tracking': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2} weights {'hm': 1, 'reg': 1, 'wh': 0.1, 'tracking': 1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1} head conv {'hm': [256], 'reg': [256], 'wh': [256], 'tracking': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256]} Creating model... Using node type: (<class 'model.networks.dla.DeformConv'>, <class 'model.networks.dla.DeformConv'>) Warning: No ImageNet pretrain!! loaded ../models/nuScenes_3Dtracking.pth, epoch 70 out_name nuscenes_mini.mp4 qt.qpa.xcb: could not connect to display qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.6/dist-packages/cv2/qt/plugins" even though it was found. This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb.

ahyunlee avatar Jan 28 '21 02:01 ahyunlee

@ahyunlee Since the Docker environment does not have a display, please modify demo.py so that it does not use the display.

Keiku avatar Feb 01 '21 06:02 Keiku

@ahyunlee Since the Docker environment does not have a display, please modify demo.py so that it does not use the display.

Thanks! It worked! I commented 'cv2.imshow' in all files.

ahyunlee avatar Feb 05 '21 04:02 ahyunlee

Everyone seems to be having trouble with issue about GPU compilation of DCNv2, so I added a Dockerfile that works correctly.

I have confirmed the operation in the following environment.

⋊> ~ cat /etc/os-release | grep PRETTY_NAME
PRETTY_NAME="Ubuntu 18.04.2 LTS"
⋊> ~ docker --version
Docker version 19.03.5, build 633a0ea838
⋊> ~ docker-compose -v
docker-compose version 1.25.3, build unknown
⋊> ~ docker info | grep -i runtime
WARNING: No swap limit support
 Runtimes: nvidia runc
 Default Runtime: nvidia
⋊> ~ 

If you want to use cuda when building docker container, you need to set the following daemon.json.

⋊> ~ cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

After preparing the above environment, execute Docker build with the following command.

docker-compose up -d dev

You cannot build with cuda unless you add the following to docker-compose.yaml.

    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all

For Docker 19.03 and later versions, use the --gpus all option. Execute the docker run command as follows.

docker run --gpus all --ipc=host --rm -it \
         -v /home/keiichi.kuroyanagi/datasets/:/CenterTrack/data/ \
         -v /home/keiichi.kuroyanagi/pretrained_models/:/CenterTrack/models/ \
         centertrack_dev

Hey @Keiku , thank you for kindly sharing your work on containerizing CenterTrack. I'm facing the same trouble with the issue about GPU compilation of DCNv2, so I've tried use your Dockerfile and docker-compose without success. Can you share an updated version of this files?

fabio-cancio-sena avatar Jul 12 '21 16:07 fabio-cancio-sena

@fabio-cancio-sena Please tell me the version of your Docker. By the way, in my understanding, nvidia-docker2 is unnecessary. Instead NVIDIA Container Toolkit is required. You can find it with a command like the following docker --version, nvidia-container-cli -V.

⋊> ~ docker --version                                                         06:45:54
Docker version 19.03.5, build 633a0ea838
⋊> ~ nvidia-container-cli -V                                                  06:46:21
version: 1.3.0
build date: 2020-09-16T12:32+00:00
build revision: 16315ebdf4b9728e899f615e208b50c41d7a5d15
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
⋊> ~ 

Keiku avatar Jul 12 '21 21:07 Keiku

docker --version

I see. Do you have an updated version of your Dockerfile? I'm having trouble building DCN locally and with torch (GPU not available) with your old dockerfile.

Here are the softwares versions: docker --version

Docker version 20.10.2, build 20.10.2-0ubuntu1~18.04.2

nvidia-container-cli -V

version: 1.4.0
build date: 2021-04-24T14:25+00:00
build revision: 704a698b7a0ceec07a48e56c37365c741718c2df
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

fabio-cancio-sena avatar Jul 13 '21 04:07 fabio-cancio-sena

@fabio-cancio-sena I also confirmed the error in the following environment. It was a long time ago, so I can't resolve the error right now. I will try to resolve the error as soon as I have free time. Please try it yourself for the time being.

⋊> ~ docker --version                                                         14:07:31
Docker version 20.10.7, build f0df350
⋊> ~ nvidia-container-cli -V                                                  14:14:25
version: 1.4.0
build date: 2021-04-24T14:26+00:00
build revision: 704a698b7a0ceec07a48e56c37365c741718c2df
build compiler: gcc-5 5.4.0 20160609
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

Keiku avatar Jul 13 '21 05:07 Keiku

Does Anyone encounter this error?

/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py:352: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
error: command 'g++' failed with exit status 1

I am not sure whether it is caused by pytorch version.

pip3 install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

I checked the docker info. It is the same as yours.

Any thoughts on this ? I appreciate you in advance.

yktangac avatar Sep 24 '21 02:09 yktangac

Note that you need to run systemctl reload docker after setting the default runtime

elkoz avatar Aug 18 '22 05:08 elkoz