Preliminary support for NVIDIA Jetson boards
NVIDIA Jetson device is an insdustrial Linux based embedded aarch64 platfrom with powerful builtin GPU, which is used for AI tasks, mostly for CV purposes.
The support is provided via --enable-nvidia-jetson switch in the configure script.
All the source code related to the NVIDIA Jetson is placed in the linux/NvidiaJetson.{h,c} source files and hidden by 'NVIDIA_JETSON' C preprocessor define. So, for x86_64 platforms the source code stays unchanged.
Additional functionality added by this commit:
- Fix for the CPU temperature reading. The Jetson device is not supported by libsensors. The CPU has 8 cores with only one CPU temperature sensor for all of them located in the thermal zone file. libsensor might be compiled in or turned off. The additional care was taken to provide successfull build with/without libsensors.
- The Jetson GPU Meter was added: current load, frequency and temperature.
== Technical details ==
The code tries to find out the correct sensors during the application startup. As an example, the sensors location for NVIDIA Jetson Orin are the following:
- CPU temperature: /sys/devices/virtual/thermal/thermal_zone0/type
- GPU temperature: /sys/devices/virtual/thermal/thermal_zone1/type
- GPU frequency: /sys/class/devfreq/17000000.gpu/cur_freq
- GPU curr load: /sys/class/devfreq/17000000.gpu/device/load
Measure:
- The GPU frequency is provided in Hz, shown in MHz.
- The CPU/GPU temperatures are provided in Celsius multipled by 1000 (milli Celsius), shown in Cesius
P.S. The GUI shows all temperatures for NVIDIA Jetson with additional precision comparing to the default x86_64 platform.
== NVIDIA Jetson models ==
Tested for NVIDIA Jetson Orin and Xavier boards.
I fear the option of --enable-nvidia-jetson will make future board-specific customizations add similar configure options. That would make things unmaintainable.
Another problem is the conflict with #1620, which is an attempt to unify the GPU meter structure to one interface.
Another problem is the conflict with #1620, which is an attempt to unify the GPU meter structure to one interface.
I looked through 'main' branch implementation of the GpuMeter. If I understand correctly, it collects information about the GPU usage from each running process.
NVIDIA Jetson has a different approach - it provides a separate GPU statistics via sysfs / custom nvgpu driver.
Since all the NVIDIA Jetson specific code is hidden under the C define 'NVIDIA_JETSON', there should be no code collisions. Semantically, the switch 'NVIDIA_JETSON' for GPU might turn off all the future code in #1620 and turning on the Jetson specific GPU code (anyway, all the data is already collected by the nvgpu driver).
You could merge the final version of the #1620 first, then I'll figure out how to reuse it correctly, on the next big holidays :)
I fear the option of
--enable-nvidia-jetsonwill make future board-specific customizations add similar configure options. That would make things unmaintainable.
The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else.
What approach would you recommend here?
I fear the option of
--enable-nvidia-jetsonwill make future board-specific customizations add similar configure options. That would make things unmaintainable.The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else.
What approach would you recommend here?
There are two ideas that came in my mind.
- The more ideal one: Make the board identifier part of the machine type, so we can have
--host=aarch64-nvidiajetson-linux-gnu. But that requires your toolchain to be configured with the same machine type identifier, which is sometimes not feasible. - The less ideal, but easier approach: name the the configure option as
--with-board=nvidia_jetson. This assumes that htop would accept patches for additional board customizations, and I don't know the maintainers' attitude on this.
Update: Oh no. Nvidia didn't use a unique machine type for their GCC cross-toolchain. Reference
I fear the option of
--enable-nvidia-jetsonwill make future board-specific customizations add similar configure options. That would make things unmaintainable.The main purpose of the commit was to minimize the interference with the major code base for the default x86_64 platform. Honestly, I do not want to compile in the nvidia jetson board specific code anywhere else. What approach would you recommend here?
There are two ideas that came in my mind.
- The more ideal one: Make the board identifier part of the machine type, so we can have
--host=aarch64-nvidiajetson-linux-gnu. But that requires your toolchain to be configured with the same machine type identifier, which is sometimes not feasible.- The less ideal, but easier approach: name the the configure option as
--with-board=nvidia_jetson. This assumes that htop would accept patches for additional board customizations, and I don't know the maintainers' attitude on this.Update: Oh no. Nvidia didn't use a unique machine type for their GCC cross-toolchain. Reference
@BenBE , as a maintainer, are you agree? If so, I will fix according to the idea №2.
There's some internal discussion still going on. We're still discussing which direction we'd like to move forward in.
Changes:
- Rebased. Honestly, I've left all GPU-related code unchanged. It utilized the different kernel API, might work with nvidia jetson one day, who knows?
- Additionally pushed the per process GPU memory allocation functionality right into the LinuxProcess class / GPU_MEM field in main screen. Marked it as experimental, because it works with root privileges only. In short, it reads the special sysfs file inside kernet/debug directory which is published by nvgpu nvidia driver, where the dictionary {pid -> gpu_memory} is published.
Added the Action for this functionality. Pressing 'g' hot key the main screen shows only the processes which uses GPU right now. Having the GPU_MEM field, you see the current GPU load per process. Useful, I guess. Hope, you'll utilize the same approach in your future development.
I've left all the deep details in both: the commit message and the NvidiaJetson.c file. Have a look, please. @BenBE
Finally, with "Jetson GPU" Meter and "g" hot key applied, with GPU_MEM field, the "htop" looks like this:
Fixed issues, @BenBE, have a look, please.
Polite ping, @BenBE
Polite ping, @BenBE
NP. Haven't forgotten you, just busy right now (day job). Will take a look as things calm down again; this will take a few weeks though.