jetson-containers icon indicating copy to clipboard operation
jetson-containers copied to clipboard

JP 5

Open felrock opened this issue 3 years ago • 20 comments

I saw that the developer preview is out for JetPack 5.0, will we see a container for this any time soon? Thanks

felrock avatar Apr 12 '22 11:04 felrock

Hi @felrock, the l4t-pytorch container has been pushed to NGC awhile: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-pytorch

The l4t-ml and l4t-tensorflow containers should be pushed shortly. The jetson-inference container is already pushed to dockerhub. Working on the ROS containers now.

dusty-nv avatar Apr 18 '22 15:04 dusty-nv

Dustin,

Thank you for updating this repo for JP5!

I have managed to build my images based on L4T 34.1 base image from the hub and using your scripts for the rest of stuff as an outline. All builds fine on jetpack 5. But starting the container throws errors.

I added details into this thread: https://forums.developer.nvidia.com/t/error-running-privileged-l4t-containers-with-the-nvidia-runtime-on-jetpack-5-0-developer-preview/211108/8

can you run your containers ok on jetpack JP 5 dev preview?

asimonov avatar Apr 19 '22 06:04 asimonov

Hi @asimonov, sorry I am unfamiliar with --privileged mode or what is causing the error - I would recommend to keep an eye on that forum topic. BTW are you able to run the container without --privileged flag (i.e. is that what is triggering the errors)

dusty-nv avatar Apr 19 '22 15:04 dusty-nv

I just checked. If I start the basic L4T image r34.1 with --priviledged OR --volume="/dev:/dev" it fails with above errors. If I remove both --priviledged AND --volume="/dev:/dev" it starts ok.

asimonov avatar Apr 19 '22 16:04 asimonov

Hello! How did you manage to build it on Jetpack 5.0?

I am trying to run: $ ./scripts/docker_build_ml.sh pytorch

What I get is: reading L4T version from /etc/nv_tegra_release L4T BSP Version: L4T R34.1.0 l4t-base image: nvcr.io/nvidia/l4t-base:r34.1 selecting OpenCV for L4T R34.1.0... OPENCV_URL=https://nvidia.box.com/shared/static/2hssa5g3v28ozvo3tc3qwxmn78yerca9.gz OPENCV_DEB=OpenCV-4.5.0-aarch64.tar.gz Python3 version: 3.8 building PyTorch torch-1.10.0-cp38-cp38-linux_aarch64.whl, torchvision v0.11.1, torchaudio v0.10.0, cuda arch 7.2;8.7 Building l4t-pytorch:r34.1.0-pth1.10-py3 container... Sending build context to Docker daemon 169kB Step 1/26 : ARG BASE_IMAGE=nvcr.io/nvidia/l4t-base:r32.4.4 Step 2/26 : FROM ${BASE_IMAGE} pull access denied for jetpack, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Default runtime is set to nvidia. This is on Jetson Xavier AGX

fortminors avatar Apr 21 '22 07:04 fortminors

Hi @fortminors, please pull the latest master from this repo, and then run scripts/docker_build_jetpack.sh first

dusty-nv avatar Apr 21 '22 13:04 dusty-nv

BTW, prebuilt l4t-pytorch image is now published on NGC if you would just prefer using that:

sudo docker pull nvcr.io/nvidia/l4t-pytorch:r34.1.0-pth1.12-py3

dusty-nv avatar Apr 21 '22 13:04 dusty-nv

I just checked. If I start the basic L4T image r34.1 with --priviledged OR --volume="/dev:/dev" it fails with above errors. If I remove both --priviledged AND --volume="/dev:/dev" it starts ok.

FYI there is workaround on the forum link quoted above

asimonov avatar Apr 29 '22 16:04 asimonov

@dusty-nv I double checked, did what you have suggested and after running the docker_build_jetpack.sh, it terminates with a message: E: Unable to locate package python3-vpi2 on step 14/19

The problem I have with l4t-pytorch image on Jetpack 5.0 is that running torch.cuda.is_available() inside the container returns False for some reason.

I've the same thing with Jeptack 4.6.1 and it works perfectly, but I can't use it since I need python >= 3.7 and Jetpack 4.6.1 with the suggested l4t-pytorch image runs on python 3.6

Am I doing something wrong?

fortminors avatar Apr 29 '22 19:04 fortminors

@dusty-nv I even tried deviceQuery cuda sample from the inside the container:

I am pulling the container as suggested:

sudo docker pull nvcr.io/nvidia/l4t-pytorch:r34.1.0-pth1.12-py3

I am launching the container as follows:

sudo docker run --rm -it --entrypoint bash nvcr.io/nvidia/l4t-pytorch:r34.1.0-pth1.12-py3

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

How do I make it work? This is on Jetson Xavier AGX

fortminors avatar Apr 29 '22 19:04 fortminors

How do I make it work? This is on Jetson Xavier AGX

Is this on JetPack 5.0 DP? l4t-pytorch:r34.1.0-pth1.12-py3 is for JetPack 5.0

dusty-nv avatar Apr 29 '22 20:04 dusty-nv

@dusty-nv Yes, correct. Jetpack 5.0 DP.

I've had everything perfectly working thanks to your images on Jetpack 4.6.1, but now I need python of version at least 3.7 (using 3.8 actually as for prebuilt torch images) so I switched to Jetpack 5.0, where these issues occur

fortminors avatar Apr 29 '22 20:04 fortminors

@dusty-nv Hello! Are there any updates on this? Jetpack 5.0 doesn't seem to be able to detect cuda for some reason

fortminors avatar May 11 '22 09:05 fortminors

Might help: I fixed the missing cuda issue on 4.6.1 by adding host cuda directories as volumes to the docker, and adding them to LD_LIBRARY_PATH, and running 'sudo ldconfig'.

javadan avatar May 11 '22 09:05 javadan

@javadan Thanks, however I have everything working on Jetpack 4.6.1 with python 3.6.

Jetpack 5.0 DP has got this weird problem that I am unable to fix

fortminors avatar May 11 '22 09:05 fortminors

@javadan Maybe I have misunderstood you. Are you working with python 3.8 on jetpack 4.6.1? I just need this exact setup as well as torch and torchvision of appropriate versions (so >= 1.11 for python 3.8), but there doesn't seem to be an image with python 3.8 (+ torch, torchvision) on Jetpack 4.6.1

fortminors avatar May 11 '22 10:05 fortminors

@fortminors No, sorry, still python 3.6.9. I meant I've had missing CUDA lib issues on 4.6.1, and that's how I eventually fixed it, adding docker volumes, and refreshing C++ linker paths.

I'm just following the issues here, so I know when to upgrade to 5.0, haha.

javadan avatar May 11 '22 10:05 javadan

@javadan Got it, thanks :) yes, looking forward to the full release of 5.0. There haven't been any announcements on the approximate dates, right?

fortminors avatar May 11 '22 10:05 fortminors

sudo docker run --rm -it --entrypoint bash nvcr.io/nvidia/l4t-pytorch:r34.1.0-pth1.12-py3

It looks like you are missing --runtime nvidia, which is what enables GPU in the container. Can you try starting the container with that?

dusty-nv avatar May 11 '22 13:05 dusty-nv

@dusty-nv You are absolutely right. I completely forgot about this flag :) Now torch sees cuda

Thank you very much!

fortminors avatar May 11 '22 14:05 fortminors