balena-jetson docker devicerequests/nvidia support

we want to support exposing gpu resources to user containers via the new DeviceRequests API introduced in docker 19.03.x

To enable this we need to have the nvidia driver, userland driver-support libraries and libnvidia-container in the host os

~~WIP branch: https://github.com/balena-os/balena-jetson/tree/rgz/cuda_libs_test~~ Depends-on: https://github.com/balena-os/meta-balena/pull/1824 [merged :confetti_ball:]

arch call notes (internal): https://docs.google.com/document/d/1tFaDKyTsdi1TUfxfAjAAGJCfUVCwmPxIstdrYaOJ-I0 arch call item (internal): https://app.frontapp.com/open/cnv_5bqfytf

Mar 31 '20 17:03 robertgzr

@acostach I think we will need: https://github.com/madisongh/meta-tegra/blob/master/recipes-devtools/cuda/cuda-driver_10.0.326-1.bb

which would give us the driver libs, right?

and we will also need https://github.com/madisongh/meta-tegra/tree/master/recipes-containers/libnvidia-container-tools for the docker integration

Apr 01 '20 10:04 robertgzr

Looks like they might @robertgzr , I need to check with a yocto build with these 2 packages. It will take a bit cause the cuda packages first need to be downloaded locally with the nvidia sdk manager, these and their dependencies can't be pulled by yocto automatically. I'll get back to you..

Apr 01 '20 12:04 acostach

I think we will still run out of space. I already have trouble getting the new balena-engine binary into some devices because of the size increase of the binary there. And the cuda stuff is going to come in with another at least 15mb or something like that

Apr 01 '20 12:04 robertgzr

@robertgzr I built an image with those and it's available on dev 3d612ed56aaa2ba22cf73ba7a2021cb7 if you want to test with the patched engine.

A couple notes would be:

libcuda appears to come from tegra-libraries, which is a package with ~130mb worth of nvidia libraries (libnv*, libnvidia*). Not sure if only some of them or all are tied together, as for instance cuda-drivers adds a depends on tegra-libraries. But if you get it to work probably we can try remove them one by one see if anything breaks.
I increased the rootfs size to allow for lots of space, for testing with these packages and the new engine

Apr 01 '20 17:04 acostach

@acostach do you have a branch on here I can use, I would like to pull in the engine using balena-os/meta-balena#1824 rather than copying the binary around...

Apr 01 '20 17:04 robertgzr

@acostach I'm trying to figure out why it won't work out of the box...

root@3d612ed:~# nvidia-container-cli info
NVRM version:   (null)
CUDA version:   10.0

Device Index:   0
Device Minor:   0
Model:          NVIDIA Tegra X2
Brand:          (null)
GPU UUID:       (null)
Bus Location:   (null)
Architecture:   6.2

root@3d612ed:~# balena run --rm -it --gpus all nvcr.io/nvidia/l4t-base:r32.3.1 bash
balena: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

I feel like the info command should return a UUID (unfortunately haven't tested this on the playground box last week)

also the container-cli can be used to ask what components of the driver are required:

root@3d612ed:~# nvidia-container-cli list
/usr/lib/libcuda.so.1.1
/usr/lib/libnvidia-ptxjitcompiler.so.32.3.1
/usr/lib/libnvidia-fatbinaryloader.so.32.3.1
/usr/lib/libnvidia-eglcore.so.32.3.1
/usr/lib/libnvidia-glcore.so.32.3.1
/usr/lib/libnvidia-tls.so.32.3.1
/usr/lib/libnvidia-glsi.so.32.3.1
/usr/lib/libGLX_nvidia.so.0
/usr/lib/libEGL_nvidia.so.0
/usr/lib/libGLESv2_nvidia.so.2
/usr/lib/libGLESv1_CM_nvidia.so.1

Apr 06 '20 11:04 robertgzr

where can I see which version of the driver is installed on the tx2?

Apr 06 '20 12:04 robertgzr

@robertgzr it's the 32.3.1 driver from l4t 32.3.1, if that's what you are referring to.

root@3d612ed:~# modinfo /lib/modules/4.9.140-l4t-r32.3.1/kernel/drivers/gpu/nvgpu/nvgpu.ko filename: /lib/modules/4.9.140-l4t-r32.3.1/kernel/drivers/gpu/nvgpu/nvgpu.ko alias: of:NTCnvidia,gv11bC* alias: of:NTCnvidia,gv11b alias: of:NTCnvidia,tegra186-gp10bC* alias: of:NTCnvidia,tegra186-gp10b alias: of:NTCnvidia,tegra210-gm20bC* alias: of:NTCnvidia,tegra210-gm20b depends:
intree: Y vermagic: 4.9.140-l4t-r32.3.1 SMP preempt mod_unload modversions aarch64

Wondering if it does so because it's not initialized, due to the firmware blobs that are usually extracted to the container weren't loaded by the driver, as they aren't in the hostOS? I'm referring to (BSP Archive) Tegra186_Linux_R32.3.1.tbz2/Linux_for_tegra/nv_tegra/nvidia_drivers.tbz2/lib/firmware/tegra18x , gp10b. Not sure this is the issue but can you try to initialize it first, maybe from a container and then shut down the container but leave the driver loaded, or unpack nv_drivers directly in the hostOS.

https://github.com/balena-io-playground/tx2-container-contracts-sample/blob/16d3ad09f0615956389f04105e3b533be9620388/tx2_32_2/Dockerfile.template#L7 but use the 32.3.1 BSP archive for the TX2 from here: https://developer.nvidia.com/embedded/linux-tegra

I haven't got time just yet to look into or release a 32.3.1 based BalenaOS for the tx2, but if you are having issues with unpacking the BSP archive in the container, here's how it works for the Nano on 32.3.1: https://github.com/acostach/jetson-nano-container-contracts/blob/51e9bfa97a91692c6b806ed32c9e96e656f5b088/nano_32_3_1/Dockerfile.template#L7

Apr 06 '20 13:04 acostach

I think we're fine in the driver department. it looks like docker is only loading it's internal compat layer for the nvidia stuff if nvidia-container-runtime-hook is present on the hostOS

I'm going to see if I can find where this is supposed to come from but I think it's usually installed as part of the libnvidia-container package

Apr 07 '20 15:04 robertgzr

@acostach ok so we need this: https://github.com/NVIDIA/nvidia-container-runtime/tree/v3.1.4/toolkit/nvidia-container-toolkit

which is provided by https://github.com/madisongh/meta-tegra/tree/master/recipes-containers/nvidia-container-toolkit

Apr 07 '20 15:04 robertgzr

@robertgzr thanks, I've updated https://github.com/balena-os/balena-jetson/commits/cuda_libs_test with this package, let me know if it works with it

Apr 07 '20 16:04 acostach

@acostach any idea why the runtime-hook is complaining about missing libraries:

root@3d612ed:~# nvidia-container-runtime-hook
nvidia-container-runtime-hook: error while loading shared libraries: libstd.so: cannot open shared object file: No such file or directory

is libstd not a rust thing?

Apr 09 '20 12:04 robertgzr

not sure @robertgzr , I see this libstd is provided by rust in the rootfs, but probably the hook binary comes pre-compiled and was built against a different version of the library?

root@3d612ed:~# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/rust/
root@3d612ed:~# nvidia-container-runtime-hook
nvidia-container-runtime-hook: symbol lookup error: nvidia-container-runtime-hook: undefined symbol: main

Apr 09 '20 16:04 acostach

the thing is there is not rust dependency? and the hook binary should be built from source by https://github.com/madisongh/meta-tegra/blob/master/recipes-containers/nvidia-container-toolkit/nvidia-container-toolkit_1.0.5.bb from https://github.com/NVIDIA/container-toolkit/tree/60f165ad6901f85b0c3acbf7ce2c66cd759c4fb8/nvidia-container-toolkit no?

something is wrong here... but I don't understand

Apr 09 '20 17:04 robertgzr

@robertgzr It doesn't look like a rust dependency, unless I'm mistaking somewhere. And that's right, the hook binary is built from sources, but they are go sources.

So it appears there are 2 libstds, one from rust as you said, which isn't good for us, and another one from go. The go version that we currently have in the image comes from meta-balena and is at version 1.10.

I think the hook was built against some newer go 'headers', although I'm not familiar with the go workflow or build process.

root@3d612ed:~# export LD_LIBRARY_PATH=/home/root/ # this is where I copied libstd.so provided by go on the shared board
root@3d612ed:~# nvidia-container-runtime-hook 
nvidia-container-runtime-hook: symbol lookup error: nvidia-container-runtime-hook: undefined symbol: runtime.arm64HasATOMICS

Looking at this: https://github.com/golang/go/blob/a1550d3ca3a6a90b8bbb610950d1b30649411243/src/cmd/internal/goobj2/builtinlist.go#L185 I see the symbol 'runtime.arm64HasATOMICS' is present starting from go version ~1.14 so I updated manually to go 1.14, updated the poky class to zeus, re-built go and nvidia-container-toolkit, then uploaded the runtime-container-toolkit and libstd.so binaries to the shared board and it appears to work:

root@3d612ed:~# export LD_LIBRARY_PATH=/home/root/
root@3d612ed:~# nvidia-container-runtime-hook 
Usage of nvidia-container-runtime-hook:
  -config string
    configuration file
  -debug
    enable debug output

Please try to run it again now on the shared device, check if this unblocks.

Apr 09 '20 21:04 acostach

@acostach oh ok that makes more sense now... sounds to me like something is still up with our go integration in meta-balena. if you check my pr here: https://github.com/balena-os/meta-balena/pull/1824

This provides the nvidia enabled balena-engine. part of it is a bump to go 1.12.12

sounds like the build of nvidida-container-toolkit uses a different go toolchain than that one? how is that possible? I thought we can enforce it via thr GOVERSION env from meta-balena

Looking at this: golang/go:src/cmd/internal/goobj2/builtinlist.go@a1550d3#L185 I see the symbol 'runtime.arm64HasATOMICS' is present starting from go version ~1.14

the toolkit recipe shouldn't have a dependency on any version of go btw. if it gets built by 1.12.12 it should just work.

I have actually never encountered something like that. I didn't even know the go stdlib could be loaded as shared

Apr 10 '20 08:04 robertgzr

I looked at the documentation a little bit:

yocto compiles the go runtime (which includes the stdlib) as a shared library: http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/go/go-runtime.inc?h=zeus#n40

the go.bbclass in poky has a switch to link the recipe to that shared lib, GO_DYNLINK: http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/classes/go.bbclass?h=zeus#n35

that is enabled for supported platforms by default I think: http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/classes/goarch.bbclass?h=zeus#n26

https://golang.org/cmd/go/#hdr-Compile_packages_and_dependencies (ctrl-f "linkshared")

Apr 10 '20 08:04 robertgzr

I manually compiled it without those flags and now it work:

root@3d612ed:~# balena run --gpus all -it balenalib/jetson-tx2-ubuntu:bionic-run bash
root@71f92d3822b7:/#

Apr 10 '20 09:04 robertgzr

Hi @robertgzr, @acostach. I'm happy to see progress on the subject. We have been asking for this feature for a long time. Will this be supported in BalenaOS anytime soon?

Apr 23 '20 16:04 dremsol

Hi @dremsol currently it's something we're considering and investigating, we don't have a timeline as the final conclusions were not reached yet.

Apr 24 '20 06:04 acostach

Hi @robertgzr & @acostach,

I've taken a deeper look at this issue and I would like to share our experiences. Besides, I have a couple of questions which i hope you can answer. First of all, our custom OS indeed shows similar output

root@photon-nano:~# nvidia-container-cli info
NVRM version:   (null)
CUDA version:   10.0

Device Index:   0
Device Minor:   0
Model:          NVIDIA Tegra X1
Brand:          (null)
GPU UUID:       (null)
Bus Location:   (null)
Architecture:   5.3

root@photon-nano:~# nvidia-container-cli list
/usr/lib/libcuda.so.1.1
/usr/lib/libnvidia-ptxjitcompiler.so.32.3.1
/usr/lib/libnvidia-fatbinaryloader.so.32.3.1
/usr/lib/libnvidia-eglcore.so.32.3.1
/usr/lib/libnvidia-glcore.so.32.3.1
/usr/lib/libnvidia-tls.so.32.3.1
/usr/lib/libnvidia-glsi.so.32.3.1
/usr/lib/libGLX_nvidia.so.0
/usr/lib/libEGL_nvidia.so.0
/usr/lib/libGLESv2_nvidia.so.2
/usr/lib/libGLESv1_CM_nvidia.so.1

This allows to run the CUDA samples by pulling nvcr.io/nvidia/l4t-base:r32.4.2 just fine under the assumption the CUDA libs are installed in HostOS. So far so good and probably the goal you want to achieve in this issue.

First thing i would like to point out is that once you would like to mount a CSI camera into the docker container it requires a daemon to run in HostOS (tegra-argus-daemon). Subsequently, the additional argument to add to run command for accessing CSI Camera from within container is: -v /tmp/argus_socket:/tmp/argus_socket. Considering a USB Camera the additional argument is --device /dev/video0:/dev/video0.

Now we are building our application using the deepstream-l4t container and this is where it get's interesting as the required HostOS packages become application dependent. Besides CUDA, it requires cuDNN and TensorRT. While this is still feasible to include somehow (either static or configurable through BalenaCloud) it becomes a mess once you need to include the application specific gstreamer plugins in HostOS. To give small snippet (not optimized);

# NVIDIA
IMAGE_INSTALL_append = " cuda-driver cuda-toolkit nvidia-container-runtime cuda-samples nvidia-docker cudnn tensorrt libvisionworks libvisionworks-sfm libvisionworks-tracking tegra-tools tegra-argus-daemon"
 
# gstreamer and plugings
## nvidia specific packages
IMAGE_INSTALL_append = " gstreamer1.0-omx-tegra gstreamer1.0-plugins-nveglgles gstreamer1.0-plugins-nvvideo4linux2 gstreamer1.0-plugins-nvvideosinks"
## most of these are pulled in as dependencies of the nvidia specific packages
## specify them explicitly as dependencies here to ensure they are included
## TODO: check depends and cleanup
IMAGE_INSTALL_append = " gstreamer1.0 gstreamer1.0-meta-base gstreamer1.0-plugins-base gstreamer1.0-plugins-bad"
IMAGE_INSTALL_append = " gstreamer1.0-plugins-good gstreamer1.0-python gstreamer1.0-rtsp-server gstreamer1.0-vaapi"

As our goal is clear, how do you see this fit in the Balena ecosystem? A 'one image to rule them all' approach would not work for all applications i guess.

May 08 '20 14:05 dremsol

[robertgzr] This issue has attached support thread https://jel.ly.fish/#/de9ddbf3-0b65-4cba-a2e2-38e43855f1bd

May 15 '20 13:05 jellyfish-bot

@dremsol how difficult do you think it would be to run the tegra-argus-daemon itself in a container as well? then you can just share the socket with your app container and it gives you full control over the dependencies too

May 15 '20 13:05 robertgzr

Hi @robertgzr,

Good suggestion, didn't test that so far. We forked balena-jetson and got nvidia-container-runtime working with balena-engine. Besides CUDA, we included cuDNN, TensorRT, and Visionworks (jetson-nano) as required by NGC l4t containers with some minor changes in nvidia-container-runtime (runc vs. balena-runc).

root@balena:~# balena run -it --rm --net=host --runtime nvidia nvcr.io/nvidia/deepstream-l4t:4.0.2-19.12-base
Unable to find image 'nvcr.io/nvidia/deepstream-l4t:4.0.2-19.12-base' locally
4.0.2-19.12-base: Pulling from nvidia/deepstream-l4t
8aaa03d29a6e: Pull complete
......
bcac47627c16: Pull complete
Total:  [==================================================>]  559.6MB/559.6MB
Digest: sha256:58c0e19332824da544b72c5eae063d1f1a0ea876af76a8e519dd71aeb023d1de
Status: Downloaded newer image for nvcr.io/nvidia/deepstream-l4t:4.0.2-19.12-base
root@balena:~#

Depending on the application, the following packages may be installed on HostOS where the container-runtime-csv bbclass makes the appropriate nvidia runtime links.

./external/openembedded-layer/recipes-multimedia/v4l2apps/v4l-utils_%.bbappend:inherit container-runtime-csv
./recipes-devtools/visionworks/libvisionworks-sfm_0.90.4.bb:inherit nvidia_devnet_downloads container-runtime-csv
./recipes-devtools/visionworks/libvisionworks_1.6.0.500n.bb:inherit nvidia_devnet_downloads container-runtime-csv
./recipes-devtools/visionworks/libvisionworks-tracking_0.88.2.bb:inherit nvidia_devnet_downloads container-runtime-csv
./recipes-devtools/gie/tensorrt_6.0.1-1.bb:inherit nvidia_devnet_downloads container-runtime-csv
./recipes-devtools/cudnn/cudnn_7.6.3.28-1.bb:inherit nvidia_devnet_downloads container-runtime-csv
./recipes-devtools/cuda/cuda-shared-binaries-10.0.326-1.inc:inherit container-runtime-csv
./recipes-devtools/cuda/cuda-cudart_10.0.326-1.bb:inherit container-runtime-csv siteinfo
./recipes-bsp/tegra-binaries/gstreamer1.0-plugins-tegra_32.3.1.bb:inherit container-runtime-csv
./recipes-bsp/tegra-binaries/tegra-libraries_32.3.1.bb:inherit container-runtime-csv
./recipes-bsp/tegra-binaries/tegra-firmware_32.3.1.bb:inherit container-runtime-csv
./recipes-bsp/tegra-binaries/libdrm-nvdc_32.3.1.bb:inherit container-runtime-csv
./recipes-bsp/tegra-binaries/tegra-nvphs-base_32.3.1.bb:inherit container-runtime-csv
./recipes-multimedia/libv4l2/libv4l2-minimal_1.18.0.bb:inherit autotools gettext pkgconfig container-runtime-csv
./recipes-multimedia/gstreamer/gstreamer1.0-plugins-nvjpeg_1.14.0-r32.3.1.bb:inherit autotools gtk-doc gettext pkgconfig container-runtime-csv
./recipes-multimedia/gstreamer/gstreamer1.0-omx-tegra_1.0.0-r32.3.1.bb:inherit autotools pkgconfig gettext container-runtime-csv
./recipes-multimedia/gstreamer/gstreamer1.0-plugins-nveglgles_1.2.3-r32.3.1.bb:inherit autotools gettext gobject-introspection pkgconfig container-runtime-csv
./recipes-multimedia/gstreamer/gstreamer1.0-plugins-nvvideo4linux2_1.14.0-r32.3.1.bb:inherit gettext pkgconfig container-runtime-csv
./recipes-multimedia/gstreamer/gstreamer1.0-plugins-nvvideosinks_1.14.0-r32.3.1.bb:inherit gettext pkgconfig container-runtime-csv

As nvidia-container-runtime expects JetPack as HostOS it's not yet clear to me which packages are really necessary besides the ones allready included. Anyway, we had to include the nvidia specific gstreamer packages in our custom OS to get our application running within deepstream container. ~~I've tried to include them with Balena but didn't succeed so far as balena-jetson depends on warrior (vs zeus in meta-tegra to support nvidia-container-runtime).~~

➜  resin-image git:(master) cat installed-package-sizes.txt | head -n 10
436914  KiB     libcudnn7
218999  KiB     tensorrt
106699  KiB     tegra-libraries
91930   KiB     cuda-cublas
52629   KiB     balena
39126   KiB     kernel-image-initramfs
35853   KiB     go-runtime
27149   KiB     libvisionworks

May 18 '20 15:05 dremsol

Hi @robertgzr & @acostach,

Had a good talk with Joe today and he asked me to keep you updated. It seems that the nvidia runtime is working nicely with balena-engine and the Host packages are being mapped accordingly by using the mount plugin.

Running the deviceQuery sample returns a PASS;

root@balena:/tmp/deviceQuery# balena run -it --runtime nvidia devicequery
./deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3962 MBytes (4154109952 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             1600 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

May 19 '20 15:05 dremsol

@dremsol this sounds amazing. with balena-engine 19.03 finally merged in it's main repo we're one step closer to making all of this happen in vanilla balenaOS. I have lifted the meta-balena PR out of draft status here: balena-os/meta-balena#1824 and it's under review right now. Once we merge the new engine there, work on this issue should pick up again.

You're using nvidi-container-runtime (the previous iteration of gpu support) while I mostly tried to make this work through https://github.com/NVIDIA/container-toolkit/ which is the approach that docker "blessed" I don't see why the mount plugin work shouldn't be possible there, as long as libnvidia-container has the changes on it's jetson branch

May 20 '20 16:05 robertgzr

Hi @robertgzr, we are very happy to hear that GPU support is moving to production. We will keep an eye on the PR.

Thanks for the suggestion and it seems you are right. It's a bit hard to follow the footsteps of NVIDIA sometimes but we managed to drop the dependency on runtime. However, this also drops the inclusion of l4t.csv but has been solved in nvidia-container-toolkit as libnvidia-container parses .csv files.

Why is NVIDIA referring to --runtime nvidia everywhere as this is obsolete?

Based on the work of @acostach in jetson-nano-sample-app we would like to run all cuda samples in a striped down version of the Dockerfile to test the --gpus all flag and the plugin mounts. We managed to get ./clock and ./deviceQuery working. However for the remaining samples involving OpenGL we stumble upon some errors related to OpenGL after building and firing the container as follows;

balena build -t cudasamples -f Dockerfile.cudesamples .
balena run -it --rm --privileged --gpus all cudasamples bash

And setting DISPLAY and running X

    $ export DISPLAY=:0
    $ X &
    $ ./clock               <PASS>
    $ ./deviceQuery         <PASS>
    $ ./postProcessGL       <FAIL>
    $ ./simpleGL            <FAIL>
    $ ./simpleTexture3D     <FAIL>
    $ ./smokeParticles      <FAIL>

Failed sample outputs look like;

simpleTexture3D Starting...

GPU Device 0: "NVIDIA Tegra X1" with compute capability 5.3

CUDA error at simpleTexture3D.cpp:247 code=30(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)"

simpleGL (VBO) starting...

GPU Device 0: "NVIDIA Tegra X1" with compute capability 5.3

CUDA error at simpleGL.cu:422 code=30(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(vbo_res, *vbo, vbo_res_flags)"
CUDA error at simpleGL.cu:434 code=33(cudaErrorInvalidResourceHandle) "cudaGraphicsUnregisterResource(vbo_res)"

./postProcessGL Starting...

(Interactive OpenGL Demo)
GPU Device 0: "NVIDIA Tegra X1" with compute capability 5.3

CUDA error at main.cpp:243 code=30(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(pbo_resource, *pbo, cudaGraphicsMapFlagsNone)"

CUDA Smoke Particles Starting...

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

The following required OpenGL extensions missing:
	GL_ARB_multitexture
	GL_ARB_vertex_buffer_object
	GL_EXT_geometry_shader4.

@acostach, have you seen these errors before? It seems like it has something to do with X but I cant figure out the cause. dmesg is showing plugging and unplugging HDMI and when running the samples the display is blinking briefly before crash. Do you have a clue?

May 22 '20 11:05 dremsol

@robertgzr i think i answered my own question as it seems compose doesn't support the --gpus all flag yet as seen in following issue.

Anyway it shouldn't be a problem to install runtime (--runtime nvidia) alongside toolkit as both flags will probably work.

May 22 '20 14:05 dremsol

@dremsol

Why is NVIDIA referring to --runtime nvidia everywhere as this is obsolete?

I know, that has been a major pain when researching this topic. I guess plenty of people out there are still using the old approaches... but there are just so many repos that claim to be the one and the container-toolkit for example doesn't even come with a README but is essential for the whole thing to work.

You are right upstream composefile support isn't progressing much: https://github.com/docker/compose/pull/7124

but that will not really be an issue I hope because you can basically communicate the same using env vars, check out their base images here

May 27 '20 11:05 robertgzr

@acostach should we try to cut down the set of commits on the wip branch? We should only need to unmask the cuda recipe, include the container-toolkit and bump meta-balena no? I guess the rootfs size needs to be investigated but I would leave this until the very end...

Jun 03 '20 16:06 robertgzr

balena-jetson balena-jetson copied to clipboard

docker devicerequests/nvidia support

balena-jetson
balena-jetson copied to clipboard