autoware icon indicating copy to clipboard operation
autoware copied to clipboard

Autoware.Universe Docker Installation Procedures are wrong

Open Croquembouche opened this issue 2 years ago • 11 comments

Checklist

  • [X] I've read the contribution guidelines.
  • [X] I've searched other issues and no duplicate issues were found.
  • [X] I'm convinced that this is not my fault but a bug.

Description

Hi,

The current steps for install Autoware.Universe via docker is wrong. The manual is hosted on Autoware's website and not here. Can I post the correct steps here and someone can update the documentation over on Autoware?

Expected behavior

Installing with the current documentation will result in the following error:

  • CUDA not found
  • CUDNN not found
  • TensorRT not found
  • nvdia-smi and cuda version missmatch
  • unable to upgrade cuda tools due to invalid cross link device
  • rocker no longer has the cuda commands.
  • colcon build fails on cpluplus due to memory contraints.

Actual behavior

Should have installed correctly and built correctly.

Steps to reproduce

Follow documentation

Versions

OS: Ubuntu 20.04 CUDA: 11.6 Rocker: 2.10

Possible causes

Stated Above.

Additional context

Stated Above.

Croquembouche avatar Aug 28 '22 21:08 Croquembouche

Thank you for opening this issue. I found 2 different instruction pages about running Autoware with Docker:

  • Documentation repository: https://github.com/autowarefoundation/autoware-documentation/blob/main/docs/installation/autoware/docker-installation.md
  • Autoware repository: https://github.com/autowarefoundation/autoware/blob/main/docker/README.md

~Your issue seems to be that the instructions on the Autoware repository are wrong~. Is my understanding correct ? EDIT: both instructions page are apparently wrong

To avoid future confusion, I propose to only maintain one page of documentation on the Documentation repository. The one on the Autoware repository can then be deleted or modified to simply link to the page on the Documentation repository.

maxime-clem avatar Aug 28 '22 22:08 maxime-clem

Hi Max,

Both of these documentations are wrong. They have the wrong option for rocker and the wrong Autoware docker image.

The part to verify that docker is installed correctly is also wrong because NVIDIA changed their wording recently.

Yes, I think there should only be 1 installation documentation to avoid confusion.

Croquembouche avatar Aug 29 '22 13:08 Croquembouche

Can you please post more details about the issues you had and how you fixed them ?

The images were recently updated and now need to be suffixed with -cuda (you can find the list of images here: https://github.com/autowarefoundation/autoware/pkgs/container/autoware-universe/versions?filters%5Bversion_type%5D=tagged). Could it have been the cause of some of your issues ?

Once you share more details I will try to reproduce the issues.

maxime-clem avatar Aug 31 '22 03:08 maxime-clem

Hi @maxime-clem @Croquembouche ,

could you finally find a way to correctly install Autoware?

Thanks

msanchezvicom avatar Sep 05 '22 10:09 msanchezvicom

I followed the instructions from https://github.com/autowarefoundation/autoware-documentation/blob/main/docs/installation/autoware/docker-installation.md with a fresh clone. I installed the dependencies using Ansible and ran in a amd64 architecture with a NVIDIA GPU.

I had no issue until building Autoware where I had the following error caused by missing dependencies:

--- stderr: tier4_pcl_extensions                                                  
In file included from /home/mclement/autoware/test_docker/src/universe/autoware.universe/sensing/tier4_pcl_extensions/src/voxel_grid_nearest_centroid.cpp:52:
/home/mclement/autoware/test_docker/src/universe/autoware.universe/sensing/tier4_pcl_extensions/include/tier4_pcl_extensions/voxel_grid_nearest_centroid_impl.hpp:66:10: fatal error: range/v3/all.hpp: No such file or directory
   66 | #include <range/v3/all.hpp>
      |          ^~~~~~~~~~~~~~~~~~

I installed the missing dependencies using the following commands.

sudo apt update
source /opt/ros/galactic/setup.bash
rosdep update
rosdep install --from-paths . --ignore-src --rosdistro $ROS_DISTRO

After that I could build without error and could run Autoware without any problem.

@Croquembouche please share more details about the issues you encountered. It is possible that there are issues when installing dependencies without Ansible, when running without NVIDIA GPU, or when using an arm64 architecture.

For now I will open a PR to add the commands needed to install missing dependencies.

maxime-clem avatar Sep 05 '22 10:09 maxime-clem

Hi,

Sorry for the late response. I tried the same instructions. But updating dependencies gave me "invalid-cross device id" when installing nvidia, cuda, and tensorrt. Manual install also resulted in this error.

The reason I discovered was that rocker's --cuda option is not available for 0.2.10 version.

I'll upload the instructions that worked for me tomorrow when I get to the lab.

Best, William

Croquembouche avatar Sep 06 '22 23:09 Croquembouche

You are right rocker does not support option --cuda but I am also using version 0.2.10 and I did not have issue. I cannot find any use of the option --cuda in the commands listed in the instructions but maybe it was recently changed. I am not sure what the invalid-cross device id could be. I will wait for more details.

maxime-clem avatar Sep 07 '22 00:09 maxime-clem

You are right rocker does not support option --cuda but I am also using version 0.2.10 and I did not have issue. I cannot find any use of the option --cuda in the commands listed in the instructions but maybe it was recently changed. I am not sure what the invalid-cross device id could be. I will wait for more details.

Hi,

The invalid cross device id comes when you update the nvidia drivers within the docker. Because rocker uses the gpu data from the host pc, and the autoware docker uses cuda 10.1 (latest cuda version is 11.7) and needs to be updated, the installer will try to update the driver on the host pc, resulting in invalid cross device id.

This issue has been solved in another issue.

I'll attach the docker image that worked for me and the steps.

Croquembouche avatar Sep 07 '22 17:09 Croquembouche

Steps to use Autoware.Universe in Docker(Rocker):

  1. Use this image ghcr.io/autowarefoundation/autoware-universe:galactic-latest-cuda. This image includes cuda 11.6, TensorRT, and CudaDNN.
  2. start rocker using this command, "rocker --nvidia --x11 --user --env NVIDIA_DRIVER_CAPABILITIES="" -- ghcr.io/autowarefoundation/autoware-universe:galactic-latest-cuda". This will allow the host gpu to be used in the docker.
  3. You need at least 64GB free RAM to colcon build the project. Having less free memory might result in random CPP error. "colcon build" will throw error and terminate the build if the pc runs out of memory. The "Add SWAP memory" section in the documentation can be used to solve this issue.

Croquembouche avatar Sep 07 '22 17:09 Croquembouche

I tried installing the dependencies with ANSIBLE. With the current listed docker image, I still receive an invalid cross device id when it tries to install/update cuda and the relevant nvidia drivers.

Croquembouche avatar Sep 07 '22 17:09 Croquembouche

After some offline discussion, I was able to better understand the issue.

  • Installation of the CUDA drivers on the host machine is missing from the Docker installation instructions and from the Ansible playbook.

Currently, if someone starts from a fresh install of Ubuntu 20.04 and simply follows the Docker installation instruction, the CUDA drivers will be missing. This can be solved by installing all dependencies (./setup-dev-env.sh) but it would be better if this was included in the Docker dependencies (./setup-dev-env.sh docker).

maxime-clem avatar Sep 09 '22 09:09 maxime-clem

Hi, I install autoware via rocker following the instructions from [https://github.com/autowarefoundation/autoware-documentation/blob/main/docs/installation/autoware/docker-installation.md]. When I colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release I meet a problem

--- stderr: trtexec_vendor
CMake Error at CMakeLists.txt:15 (find_package):
  By not providing "Findcudnn_cmake_module.cmake" in CMAKE_MODULE_PATH this
  project has asked CMake to find a package configuration file provided by
  "cudnn_cmake_module", but CMake did not find one.

  Could not find a package configuration file provided by
  "cudnn_cmake_module" with any of the following names:

    cudnn_cmake_moduleConfig.cmake
    cudnn_cmake_module-config.cmake

what should i do to fix this problem? I find the cudnn and tensorrt files (.h .so) in the docker container.

jfkkf123 avatar Oct 04 '22 15:10 jfkkf123

If you followed the instructions I am not sure what could be the issue.

  • Do you run colcon build inside the Docker container ?
  • Did you install the dependencies with ./setup-dev-env.sh on the host (not inside the Docker container) ?
  • Are you using a AMD64 architecture with a NVIDIA GPU ?

maxime-clem avatar Oct 04 '22 16:10 maxime-clem

what should i do to fix this problem? I find the cudnn and tensorrt files (.h .so) in the docker container.

This error is probably caused by the following missing package: ros-galactic-tensorrt-cmake-module. Missing packages can be installed with the following command:

rosdep install -y --from-paths src --ignore-src --rosdistro $ROS_DISTRO

maxime-clem avatar Oct 06 '22 04:10 maxime-clem

what should i do to fix this problem? I find the cudnn and tensorrt files (.h .so) in the docker container.

This error is probably caused by the following missing package: ros-galactic-tensorrt-cmake-module. Missing packages can be installed with the following command:

rosdep install -y --from-paths src --ignore-src --rosdistro $ROS_DISTRO

I update the docker image autoware-universe:latest-cuda and dependent ROS packages. Now everything is OK. Thank you.

jfkkf123 avatar Oct 07 '22 14:10 jfkkf123

The documentation and the ansible script have been updated. I followed the Docker installation instructions from a fresh Ubuntu 22.04 install and had no issue.

maxime-clem avatar Dec 07 '22 07:12 maxime-clem