glances icon indicating copy to clipboard operation
glances copied to clipboard

GPU monitoring (AMD / ATI)

Open nicolargo opened this issue 8 years ago • 26 comments

Description

As implemented in the issue https://github.com/nicolargo/glances/issues/170 for the NVidia GPUs, the goal is to enhance the GPU plugin for AMD / ATI GPUs.

First of all, we have to find a Python Lib to grab stats, then change the current plugin to be able to monitor AMD / ATI GPU.

nicolargo avatar Jan 08 '17 09:01 nicolargo

As shared in #994, this and this were the starting points of my research.


First of all, we have to find a Python Lib to grab stats

I don't know of any python libraries that do this out of the box. But RadeonTop and aticonfig are command line tools that both grab stats.

They can be used like this to get stats. That code spawns a subprocess running aticonfig and parses its text output. That technique may not be a good long term solution though.

kdbanman avatar Jan 08 '17 17:01 kdbanman

Also have a look on https://github.com/asornoso/AMD-GPU-INFO

nicolargo avatar May 26 '17 21:05 nicolargo

Amazing job guys! I miss this AMD/ATI GPU monitoring plugin only. This may can help. Old code, but I hope to you can use it :) https://github.com/bitshiftio/pyADL or https://github.com/mjmvisser/adl3

hunasdf avatar Dec 20 '17 09:12 hunasdf

@hunasdf Nice ! I will have a look on it and came back to you for the testing step because i do not have any AMD/ATI device...

nicolargo avatar Dec 21 '17 09:12 nicolargo

@nicolargo Of course, I help for you! I am a programer too with a little Python knowledge (i am learning it). And I have 5 RX 460 4GB in a single machine and 1 R9 380X 2GB card in another one. On both machine running Win 10 now. Just write to me to what I should do :)

hunasdf avatar Dec 21 '17 10:12 hunasdf

@hunasdf Just have a quick look on the ADL3 lib (available in Pypi). No documentation and the code is very old...

Can you just try this on your machine (from a console) and copy/paste the result:

pip install adl3
python
>>> import adl3
>>> dir(adl3)

Thanks

Note, on my Linux machine without AMD/ATI card:

In [2]: import adl3
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-50ec25a0a1b3> in <module>()
----> 1 import adl3

/usr/local/lib/python2.7/dist-packages/adl3/__init__.py in <module>()
----> 1 from .adl_api import *

/usr/local/lib/python2.7/dist-packages/adl3/adl_api.py in <module>()
     40 
     41         # load the ADL 3.0 dso/dll
---> 42         _libadl = CDLL("libatiadlxx.so", mode=RTLD_GLOBAL)
     43 
     44         # ADL requires we pass an allocation function and handle freeing it ourselves

/usr/lib/python2.7/ctypes/__init__.pyc in __init__(self, name, mode, handle, use_errno, use_last_error)
    360 
    361         if handle is None:
--> 362             self._handle = _dlopen(self._name, mode)
    363         else:
    364             self._handle = handle

OSError: libatiadlxx.so: cannot open shared object file: No such file or directory

nicolargo avatar Dec 21 '17 20:12 nicolargo

@nicolargo On Win 10, with R9 380x (python 2.7) But I can add access for you for this machine if it is help :)

adl3.txt

hunasdf avatar Dec 22 '17 08:12 hunasdf

@hunasdf Yes it will be nice but without any documentation it will not be easy...

nicolargo avatar Dec 22 '17 08:12 nicolargo

@nicolargo I check this lib and I do not know... Maybe we can use a part of it. I tried to use "atitweak -l" for simple list the adapters and I got an error. No problem, I started to debug the issue. I found the mistake at ADL_Adapter_ID_Get (ADL_Adapter_NumberOfAdapters_Get and ADL_Adapter_AdapterInfo_Get maybe OK). I started to search any reference or something about it, and I found this page: https://developer.amd.com/display-library-adl-sdk/ The latest ADL version is 10 (vs 3 what i tried to use). I downloaded it and the zip contains full documentation about it. After a little search i found ADL2_Adapter_Active_Get. So the function is renamed and has other parameter list too. As a result, I have some result and if I have a little time, I will continue the debugging/developing.

Ps.: Sorry for my English, I hope you understand it :)

hunasdf avatar Dec 22 '17 20:12 hunasdf

@hunasdf thanks for your time and do not worry about your English ;)

I want to let specific plugin code outside the Glances source project. So i prefer to have an external ATI/AMD Python Lib. If you want i can initialize a "Git repository skeleton" for this new lib where you test your development. What do you think ?

nicolargo avatar Dec 22 '17 20:12 nicolargo

It is a great idea, lets try it!

hunasdf avatar Dec 22 '17 21:12 hunasdf

@nicolargo Good news: something happened! I updated the ATI/AMD driver to 17.12 (block chain driver), and restarted the machine. And use ADL2_Adapter_ID_Get instead of ADL_Adapter_ID_Get. Now, here the output:

adl3-master>python atitweak -l 0. AMD Radeon (TM) R9 380 Series (\.\DISPLAY4) engine clock range is 150 - 1200MHz memory clock range is 75 - 1750MHz core voltage range is 0 - 0VDC performance level 0: engine clock 300MHz, memory clock 150MHz, core voltage 0VDC performance level 1: engine clock 985MHz, memory clock 1400MHz, core voltage 0VDC fan speed range: 0 - 100%, 0 - 6000 RPM

hunasdf avatar Dec 23 '17 08:12 hunasdf

@hunasdf As already done for NVidia (see http://glances.readthedocs.io/en/stable/aoa/gpu.html) the Glances GPU plugin should display the following information for the AMD/ATI GPU:

  1. GPU name ==> Look like the information is available ("0. AMD Radeon (TM) R9 380 Series (.\DISPLAY4)). I think that the "0." is the GPU number ?
  2. GPU process load in %. I do not see this stat in the output of the atitweak command line. Perhaps the "engine clock range is 150 - 1200MHz" (150*100/1200) ? What the difference with the performance level line ?
  3. Memory consumption in %. I do not see it ? Is there another method to grab this stat from the ADL lib ?
  4. Additionally, we can also the FAN speed in the Sensor plugin (what about the temperature sensor ?)

@hunasdf I just create a new Github repository with the skeleon for the future lib. I call it PyADL (https://github.com/nicolargo/pyadl). Please click here to be added as collaborator. You can now clone, commit and push your dev inside. Let me know if you need additional information.

nicolargo avatar Dec 24 '17 09:12 nicolargo

@nicolargo I think, this informations is available in the ADL. I seen somewhere. Next step will be grab there. (The infos in my last comment were the same output what atitweak generated. I just was happy to we can use this “old” adl3 lib :)) And thanks the invitation, I accepted it!

hunasdf avatar Dec 24 '17 10:12 hunasdf

I commited the first alpha version. With this, you can grab the described infos except Memory consumption. If something not okay with the code, the license, anything, just write to me! :)

hunasdf avatar Dec 26 '17 12:12 hunasdf

Hi @hunasdf

My comments here: https://github.com/nicolargo/pyadl/issues/1

Thanks !

nicolargo avatar Dec 26 '17 15:12 nicolargo

@hunasdf Any news concerning PyADL ? Do you think that it will be available soon ? If not i have to postpone the implementation of the Glances AMD GPU... Let me know.

nicolargo avatar Feb 03 '18 15:02 nicolargo

@nicolargo I did not deal with it (i did not have enough time). In my opinion, we must to find other solution for linux. The latest ADL SDK came out on "08/10/2016", so .... I do not happy about it, but yes, we have to postpone the integration :( What do you think, it is a problem to we will use this PyADL for windows and a totally other solution for linux? For example sysfs.

hunasdf avatar Feb 04 '18 08:02 hunasdf

From a Glances dependencies point of view it is a problem because it is a cross platform software so the difference between Windows and others OS should be manage by the lib (PyADL), not by Glances.

nicolargo avatar Feb 06 '18 20:02 nicolargo

Of course, PyADL should manage it. But if PyADL uses something other, its name can be deception, maybe.

hunasdf avatar Feb 06 '18 21:02 hunasdf

@nicolargo For GPU temps, it seems the normal systems input for telegraf pulls ATI GPU Temp sensors as well, if that's some sort of hint of where you could grab that info.

douglasg14b avatar Jul 25 '19 21:07 douglasg14b

I know this is an older issue but I thought I would share

https://github.com/kdschlosser/ati_radeon

kdschlosser avatar May 24 '20 07:05 kdschlosser

Is GPU monitoring for AMD gpus implemented yet? Thanks

johntiger1 avatar May 09 '21 01:05 johntiger1

@johntiger1 Nope because there is no cross platform Python lib to grab GPU AMD stat correctly.

nicolargo avatar May 09 '21 14:05 nicolargo

Apologies for helicoptering into this issue...Python is not my thing. I grabbed a couple of Z600s that came with old AMD cards, so started searching to see if Glances could support it and ended up here. Because this issue is old, I did a search on PyPI and found pyamdgpuinfo, which appears to be an active (in beta) project. No idea if this helps or not, but thought I'd share. As always, thank you @nicolargo for this amazing app.

imdebating avatar Dec 21 '21 15:12 imdebating

@imdebating https://pypi.org/project/pyamdgpuinfo/ lloks geat ! I will have a look :) Thanks for the head up !

nicolargo avatar Dec 21 '21 15:12 nicolargo

Have a look on:

  • https://github.com/Umio-Yasuno/amdgpu_top
  • https://github.com/ROCm/amdsmi
  • https://github.com/clbr/radeontop

nicolargo avatar Mar 26 '24 18:03 nicolargo

@nicolargo nvtop is nice https://news.ycombinator.com/item?id=39687132

works for amd too (or for me at least)

image

PhilipDeegan avatar Mar 26 '24 19:03 PhilipDeegan

From the NVtop documentation: "NVTOP supports AMD GPUs using the amdgpu driver through the exposed DRM and sysfs interface.

AMD introduced the fdinfo interface in kernel 5.14 (browse kernel source). Hence, you will need a kernel with a version greater or equal to 5.14 to see the processes using AMD GPUs.

Support for recent GPUs are regularly mainlined into the linux kernel, so please use a recent-enough kernel for your GPU."

Additional information:

  • https://www.kernel.org/doc/html/v6.1/gpu/amdgpu/thermal.html
  • https://wiki.archlinux.org/title/AMDGPU#Manually => https://gist.github.com/nicolargo/639fb23baaedf3c7ce29f4f9de88548b

nicolargo avatar Apr 02 '24 09:04 nicolargo

Is anyone with a AMD GPU can copy paste the result of the command available in the Gist: https://gist.github.com/nicolargo/639fb23baaedf3c7ce29f4f9de88548b (please add a comment directly in the Gist) ?

cc: @PhilipDeegan

nicolargo avatar Apr 02 '24 11:04 nicolargo