htop icon indicating copy to clipboard operation
htop copied to clipboard

Add GPU utilization meter

Open ThyrixYang opened this issue 4 years ago • 16 comments

I think it would be nice if there could be a usage bar like cpu for gpu processors. The only available option is the nvidia-smi command but it's awkward.

ThyrixYang avatar Dec 16 '20 08:12 ThyrixYang

While this sounds interesting it'd be appreciated if we can keep as neutral as possible. Thus an implementation of this should cover most systems regardless of the vendor supplying the hardware.

Also, as was done with some other features that rely on external libraries the implementation should try to dynamically load the required libraries at runtime. Or even avoid external libraries at all when possible.

BenBE avatar Dec 16 '20 08:12 BenBE

https://github.com/rib/gputop for inspiration

fasterit avatar Dec 16 '20 09:12 fasterit

It seems that the gputop mentioned by @fasterit only supports intel gpus. As a deep learning trainer, I'm looking for a monitor for nvidia gpus. And I believe stand-alone graphics cards are more needed to be monitored than intel gpus, since we are often working with multiple nvidia gpus. AMD gpus are not available for deep learning training at least for now, so I think only supporting nvidia gpus are already very useful for us. It would be better if AMD, and Intel gpus can be supported as well.

ThyrixYang avatar Dec 16 '20 14:12 ThyrixYang

I wish that I could write a script inside htoprc, such that:

...
GpuTemp_handler=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader,nounits)
...

stulluk avatar Dec 19 '20 21:12 stulluk

If I had to use this feature I'd expect the results to show me the utilisation of the GPU for doing work with CUDA, OpenGL, OpenCV, Xtend, or OpenCL. Also, I don't see how a busy GPU would have the same metrics consideration as for a busy CPU.

danie-dejager avatar Feb 12 '21 12:02 danie-dejager

There is an issue on the previous version of the repo: https://github.com/hishamhm/htop/issues/899

Implementing generic custom meters should solve the problem of vendor neutrality.

sevagh avatar Feb 13 '21 00:02 sevagh

IF (card = nvidia) data = nvidia-smi -l --query-gpu=timestamp,temperature.gpu,memory.used,memory.free --format=csv

elif (card = AMD)

data = amd version

elif (card = intel) (intel version)

else () (return card not supported)

Although it doesn't have the same tick rate as Htop itself, so costum code would probably be required to be made.

I’d be willing to have a go at coding it myself, but I’m not a c programmer and I have no idea where to go looking in the codebase to add something like this.

NicTanghe avatar Oct 04 '22 13:10 NicTanghe

Hi folks, You can achieve this in htop today if you use the pcp-htop variant and the nvidia PCP metrics. This is probably the best long term solution here since the way the metrics are extracted (at least in the case of nvidia) requires a separate daemon like has been done for the 'gputop server' ... this is also the architecture PCP provides already.

https://man7.org/linux/man-pages/man1/pmdanvidia.1.html https://man7.org/linux/man-pages/man1/pcp-htop.1.html

natoscott avatar Oct 04 '22 21:10 natoscott

I`ve installed the packages but when i run pcp htop

I seem to just get an htop wich doesn`t even show my running processes.

also no GPU option in headers layout menu.

NicTanghe avatar Oct 05 '22 15:10 NicTanghe

[...] I seem to just get an htop wich doesn`t even show my running processes.

Hmm, maybe an installation issue, everythings working fine here - can you fetch values via:

pminfo --fetch proc.psinfo.rss

Which Linux distribution? PCP version? Can you paste output from 'pcp summary'?

[...] also no GPU option in headers layout menu.

Yep, you're blazing a trail here - this will involve adding a new text config file alongside the others below pcp/meters/ in the htop repo specifying which metrics you want to display.

natoscott avatar Oct 05 '22 21:10 natoscott

Hi folks, You can achieve this in htop today if you use the pcp-htop variant and the nvidia PCP metrics. This is probably the best long term solution here since the way the metrics are extracted (at least in the case of nvidia) requires a separate daemon like has been done for the 'gputop server' ... this is also the architecture PCP provides already.

same here as @NicTanghe's issue Linux 6.0.2-arch1-1 also:

pcp-summary: Cannot connect to PMCD on host "local:": Connection refused

ghost avatar Oct 21 '22 23:10 ghost

"Connection refused" means pmcd(1) is not running - try 'systemctl start pmcd' (or equivalent for your local init system). If pmcd isn't available you likely haven't installed pmdanvidia(1) either.

natoscott avatar Oct 22 '22 01:10 natoscott

I`ve been buisy with other stuf and probably wont reply any time soon

NicTanghe avatar Oct 31 '22 17:10 NicTanghe

see https://github.com/Syllo/nvtop, they seem to have this figured out. htop needs this!

benjamin051000 avatar Aug 24 '23 05:08 benjamin051000

Thank you for this pointer. When we last had a look at the GPU utilization stuff there was no real unified interface available yet. But given that fdinfo seems to be the way to go, this seems to have become reasonable to implement and maintain.

@benjamin051000: Do you mind helping with a PR for initial support for these?

BenBE avatar Aug 24 '23 20:08 BenBE

I know this is scope creep, but adding Apple Metal GPU stats would be awesome too.

djh00t avatar Aug 29 '23 00:08 djh00t

I confirm this works on my ryzen 5700G out of box . I compiled htop as follows:

./autogen.sh
./configure --enable-static --enable-sensors
make

It can even show GPU usage time per process !

Thank you so much.

stulluk avatar Mar 29 '24 20:03 stulluk

I confirm this works on my ryzen 5700G out of box . I compiled htop as follows:

./autogen.sh
./configure --enable-static --enable-sensors
make

It can even show GPU usage time per process !

Thank you so much.

Sorry how does one enable this? I have an RX 580 and I can't find the option to enable the GPU usage in htop. (I also tried building it manually like you did)

Samueru-sama avatar May 20 '24 22:05 Samueru-sama

I confirm this works on my ryzen 5700G out of box . I compiled htop as follows:

./autogen.sh
./configure --enable-static --enable-sensors
make

It can even show GPU usage time per process ! Thank you so much.

Sorry how does one enable this? I have an RX 580 and I can't find the option to enable the GPU usage in htop. (I also tried building it manually like you did)

Press F2. Scroll down to "Meters" On the 4th column, you will see "GPU usage" . Focus on it an then press ENTER to move it to 2nd column.

image

Hope this helps.

stulluk avatar May 20 '24 22:05 stulluk

I confirm this works on my ryzen 5700G out of box . I compiled htop as follows:

./autogen.sh
./configure --enable-static --enable-sensors
make

It can even show GPU usage time per process ! Thank you so much.

Sorry how does one enable this? I have an RX 580 and I can't find the option to enable the GPU usage in htop. (I also tried building it manually like you did)

Press F2. Scroll down to "Meters" On the 4th column, you will see "GPU usage" . Focus on it an then press ENTER to move it to 2nd column.

image

Hope this helps.

Yeah it is not on my system.

image

Thank you either way, now I know that at least the issue isn't that I couldn't find the option.

Samueru-sama avatar May 20 '24 23:05 Samueru-sama

I confirm this works on my ryzen 5700G out of box . I compiled htop as follows:

./autogen.sh
./configure --enable-static --enable-sensors
make

It can even show GPU usage time per process ! Thank you so much.

Sorry how does one enable this? I have an RX 580 and I can't find the option to enable the GPU usage in htop. (I also tried building it manually like you did)

Press F2. Scroll down to "Meters" On the 4th column, you will see "GPU usage" . Focus on it an then press ENTER to move it to 2nd column. image Hope this helps.

Yeah it is not on my system.

image

Thank you either way, now I know that at least the issue isn't that I couldn't find the option.

Not sure if this helps, but I wanted to share my htoprc for you:

stulluk ~ $  cat .config/htop/htoprc 
# Beware! This file is rewritten by htop when settings are changed in the interface.
# The parser is also very primitive, and not human-friendly.
htop_version=3.4.0-dev
config_reader_min_version=3
fields=0 48 17 18 38 39 40 2 46 47 132 49 1
hide_kernel_threads=1
hide_userland_threads=1
hide_running_in_container=0
shadow_other_users=0
show_thread_names=1
show_program_path=1
highlight_base_name=0
highlight_deleted_exe=1
shadow_distribution_path_prefix=0
highlight_megabytes=1
highlight_threads=1
highlight_changes=0
highlight_changes_delay_secs=5
find_comm_in_cmdline=1
strip_exe_from_cmdline=1
show_merged_command=0
header_margin=1
screen_tabs=0
detailed_cpu_time=0
cpu_count_from_one=0
show_cpu_usage=1
show_cpu_frequency=1
show_cpu_temperature=1
degree_fahrenheit=0
update_process_names=0
account_guest_in_cpu_meter=0
color_scheme=0
enable_mouse=1
delay=15
hide_function_bar=0
header_layout=two_50_50
column_meters_0=LeftCPUs2 Memory DiskIO NetworkIO Swap
column_meter_modes_0=1 1 2 2 1
column_meters_1=RightCPUs2 Tasks LoadAverage Systemd GPU
column_meter_modes_1=1 2 2 2 1
tree_view=0
sort_key=46
tree_sort_key=46
sort_direction=-1
tree_sort_direction=-1
tree_view_always_by_pid=0
all_branches_collapsed=0
screen:Main=PID USER PRIORITY NICE M_VIRT M_RESIDENT M_SHARE STATE PERCENT_CPU PERCENT_MEM GPU_PERCENT TIME Command
.sort_key=PERCENT_CPU
.tree_sort_key=PERCENT_CPU
.tree_view_always_by_pid=0
.tree_view=0
.sort_direction=-1
.tree_sort_direction=-1
.all_branches_collapsed=0
screen:I/O=PID USER IO_PRIORITY IO_RATE IO_READ_RATE IO_WRITE_RATE Command
.sort_key=IO_RATE
.tree_sort_key=PID
.tree_view_always_by_pid=0
.tree_view=0
.sort_direction=-1
.tree_sort_direction=1
.all_branches_collapsed=0
stulluk ~ $ 

Other than this, I am just wondering if you installed "libsensors5" and rebooted ? or have you run "sensors-detect" ?

stulluk avatar May 20 '24 23:05 stulluk

# Beware! This file is rewritten by htop when settings are changed in the interface.
# The parser is also very primitive, and not human-friendly.
htop_version=3.4.0-dev
config_reader_min_version=3
fields=0 48 17 18 38 39 40 2 46 47 132 49 1
hide_kernel_threads=1
hide_userland_threads=1
hide_running_in_container=0
shadow_other_users=0
show_thread_names=1
show_program_path=1
highlight_base_name=0
highlight_deleted_exe=1
shadow_distribution_path_prefix=0
highlight_megabytes=1
highlight_threads=1
highlight_changes=0
highlight_changes_delay_secs=5
find_comm_in_cmdline=1
strip_exe_from_cmdline=1
show_merged_command=0
header_margin=1
screen_tabs=0
detailed_cpu_time=0
cpu_count_from_one=0
show_cpu_usage=1
show_cpu_frequency=1
show_cpu_temperature=1
degree_fahrenheit=0
update_process_names=0
account_guest_in_cpu_meter=0
color_scheme=0
enable_mouse=1
delay=15
hide_function_bar=0
header_layout=two_50_50
column_meters_0=LeftCPUs2 Memory DiskIO NetworkIO Swap
column_meter_modes_0=1 1 2 2 1
column_meters_1=RightCPUs2 Tasks LoadAverage Systemd GPU
column_meter_modes_1=1 2 2 2 1
tree_view=0
sort_key=46
tree_sort_key=46
sort_direction=-1
tree_sort_direction=-1
tree_view_always_by_pid=0
all_branches_collapsed=0
screen:Main=PID USER PRIORITY NICE M_VIRT M_RESIDENT M_SHARE STATE PERCENT_CPU PERCENT_MEM GPU_PERCENT TIME Command
.sort_key=PERCENT_CPU
.tree_sort_key=PERCENT_CPU
.tree_view_always_by_pid=0
.tree_view=0
.sort_direction=-1
.tree_sort_direction=-1
.all_branches_collapsed=0
screen:I/O=PID USER IO_PRIORITY IO_RATE IO_READ_RATE IO_WRITE_RATE Command
.sort_key=IO_RATE
.tree_sort_key=PID
.tree_view_always_by_pid=0
.tree_view=0
.sort_direction=-1
.tree_sort_direction=1
.all_branches_collapsed=0

Didn't work, could you share the htop binary that works on your end? It might be that my gpu isn't supported.

Yes lm_sensors is installed, it is even a dependency of mesa (that's what I assume you meant by libsensors5 since that package isn't on the arch based distro that I use).

This is what I get when running sensors-detect: image

Samueru-sama avatar May 21 '24 04:05 Samueru-sama

htop-x86-64-3.4.0-stulluk-gpu-works-static-compile.tar.gz

I hope it helps.

MD5SUM: f8b654c937c72591d9a3a5599cfd6cef

stulluk avatar May 21 '24 20:05 stulluk

htop-x86-64-3.4.0-stulluk-gpu-works-static-compile.tar.gz

I hope it helps.

MD5SUM: f8b654c937c72591d9a3a5599cfd6cef

Thank you this works, looks like I have an issue with libraries on my end. Because I just tried to build it again and I can't compile static on artix linux even though I have all the libraries needed.

Omg this has taken so long I give up, I can't get this compile that feature.

I tried the official arch package, I downloaded the debian package as well, and also built the htop package statically using github workflows on a ubuntu machine, none gave me a htop that has a working gpu meter, the only one that has it is your binary.

Samueru-sama avatar May 21 '24 21:05 Samueru-sama

The GPU meter is not on my htop either. I am using NixOS unstable with htop 3.3.0.

myclevorname avatar Jun 22 '24 14:06 myclevorname

The GPU meter is not on my htop either. I am using NixOS unstable with htop 3.3.0.

If I am not mistaken, this feature was added since 3.4.0 (see my build version above )

stulluk avatar Jun 22 '24 14:06 stulluk

The GPU meter is not on my htop either. I am using NixOS unstable with htop 3.3.0.

If I am not mistaken, this feature was added since 3.4.0 (see my build version above )

Thanks.

myclevorname avatar Jun 22 '24 23:06 myclevorname