autoware.universe icon indicating copy to clipboard operation
autoware.universe copied to clipboard

Can't launch lidar_centerpoint with error: could not load library: libcenterpoint_cuda_libraries.so.

Open mikechan0731 opened this issue 2 years ago • 9 comments

Checklist

  • [X] I've read the contribution guidelines.
  • [X] I've searched other issues and no duplicate issues were found.
  • [X] I'm convinced that this is not my fault but a bug.

Description

I use a custom docker image and want to try using ros2 lidar_centerpoint. It has been successfully established so far, but when using ros2 launch, the following error occurs. 圖片

Expected behavior

Hope to use point cloud data for object recognition.

Actual behavior

Make sure that the docker command has read the gpu. The instructions are as follows:

docker run --user $(id -u):$(id -g) --rm -it  --gpus all -e DISPLAY -e TERM   -e QT_X11_NO_MITSHM=1 -e XAUTHORITY=/tmp/.dockerjwzszsxi.xauth -v /tmp/.dockerjwzszsxi.xauth:/tmp/.dockerjwzszsxi.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro autoware-universe-ros2-export:v1 /bin/bash

Steps to reproduce

Nvidia-smi

Versions

Docker Link: https://drive.google.com/file/d/1sqpyMhZeLFWqCIbepB8lhlv8140bbxac/view?usp=sharing OS: Ubuntu 20.04 ROS2 Galatic

Possible causes

No response

Additional context

No response

mikechan0731 avatar Jun 08 '22 11:06 mikechan0731

I'll check with your docker image.

yukke42 avatar Jun 10 '22 06:06 yukke42

@mikechan0731 I successfully launch lidar_centerpoint in my environment.

docker run --user $(id -u):$(id -g) --rm -it  --gpus all -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro -v $PWD:/home/itri/autoware_workspace 81e1c4fb8f3e /bin/bash

Screenshot from 2022-06-10 16-54-41

Cloud you describe your CUDA environment in more detail?

And I had warnings below, but there was no warning when I built the package using the docker image of autoware.

Screenshot from 2022-06-10 16-58-40

yukke42 avatar Jun 10 '22 07:06 yukke42

Hi! @yukke42

Thank you so much for your test, it seems that there is no problem with this docker file, the problem is my GPU, I will try it with a different computer.

My GPU itself is nvidia RTX2080

Local environment driver = 440 , cuda = 10.2 (as shown) 圖片

The environment enabled in docker has the same driver, cuda=11.4, I don't know if this will be a problem. 圖片

Thanks again for your assistance!!

mikechan0731 avatar Jun 13 '22 02:06 mikechan0731

@mikechan0731

Local environment driver = 440 , cuda = 10.2 (as shown)

This error might be caused by the version mismatch that the local cuda driver doesn't support cuda 11.1.

yukke42 avatar Jun 15 '22 04:06 yukke42

Hi, I test new env with nvidia driver 470. lidar_centerpoint is built without error, but it still show msg when I try to launch it: image

I use colcon build --continue-on-error and 150 package is built. image

I am not sure how this happen and I am really want to test the performance.

Here is the code link I built (part of autoware.universe + part of autoware.common, 2.8G) https://1drv.ms/u/s!AnJ4ubRnmXsujIQxTIZucL7Ump9w7A?e=neq4JP

Thanks!

mikechan0731 avatar Jun 24 '22 03:06 mikechan0731

Can you try to rebuild the lidar_centerpoint package by removing the built targets in /build and /install, and also to make sure that the version of CUDA version in docker is same with local system.

Sharrrrk avatar Jul 18 '22 08:07 Sharrrrk

@mikechan0731 do you have any updates?

xmfcx avatar Jul 26 '22 16:07 xmfcx

@mikechan0731 has tried rebuilding the stack and still had the issue. However, he decided to use the default Autoware docker to avoid the issue. We will close this issue until someone else faces similar issue with his/her custom docker image.

mitsudome-r avatar Aug 02 '22 07:08 mitsudome-r

@mitsudome-r I have the same issue with custom environment (Nvidia NGC TensorRT container with tensorrt 8.4.1-1+cuda11.6).

I fixed the issue by adding the following to CMakeLists.txt

  install(
    TARGETS
      centerpoint_cuda_lib
  )

You can find the commit here.

It turns out libcenterpoint_cuda_libraries.so is built in the build/lidar_centerpoint folder but never installed to the install/lidar_centerpoint folder. I am curious as to why this woud work on some version of CUDA and tensorrt as it seems to be CMake issue. tensorrt_yolo for example has a similar line that installs its built CUDA library:

https://github.com/autowarefoundation/autoware.universe/blob/main/perception/tensorrt_yolo/CMakeLists.txt#L192

I suggest reopen this issue and see if there are other reasons causing this. Otherwise the example commit above would be the fix.

HaoruXue avatar Aug 27 '22 04:08 HaoruXue

close this issue since this error is fixed in the PR https://github.com/autowarefoundation/autoware.universe/pull/1916.

yukke42 avatar Sep 21 '22 05:09 yukke42