resources icon indicating copy to clipboard operation
resources copied to clipboard

Support the new Intel Xe driver for GPU usage

Open cvlc12 opened this issue 9 months ago • 49 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

Is your feature request related to a problem? Please describe.

Hi,

On my laptop with intel integrated graphics, Resources shows GPU usage when using the i915 driver, but not when using the newer Xe driver.

Not really urgent, but would be nice if it was supported ! FYI, tools like intel_gpu_top still don't support it either...

Have a good day !

System info: 11th Gen Intel® Core™ i7-1165G7 × 8 Intel® Iris® Xe Graphics (TGL GT2) Linux 6.13.4-arch1-1 Gnome 47 Resources 1.7.1

cvlc12 avatar Feb 24 '25 15:02 cvlc12

Related info: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/153#note_2776042

Apparently it would not work properly on my TGL system, but it could be useful for others anyway!

cvlc12 avatar Feb 24 '25 19:02 cvlc12

Hi, thank you for the issue.

Do you mind sending me a screenshot of the GPU page with the xe driver running? Just so I know what exactly is missing. :)

nokyan avatar Feb 24 '25 19:02 nokyan

Bottom one is Xe Image Image

cvlc12 avatar Feb 24 '25 20:02 cvlc12

Hi, I've implemented support in the xe-support branch. Took a bit longer because I had to refactor much of the GPU per-process detection code. Do you mind checking it out?

nokyan avatar Mar 05 '25 17:03 nokyan

Sure but I won't be able to do that until mid March. I'll keep you posted.

Le 5 mars 2025 17:01:55 UTC, nokyan @.***> a écrit :

nokyan left a comment (nokyan/resources#458)

Hi, I've implemented support in the xe-support branch. Took a bit longer because I had to refactor much of the GPU per-process detection code. Do you mind checking it out?

-- Reply to this email directly or view it on GitHub: https://github.com/nokyan/resources/issues/458#issuecomment-2701544054 You are receiving this because you authored the thread.

Message ID: @.***>

cvlc12 avatar Mar 05 '25 17:03 cvlc12

Sure but I won't be able to do that until mid March. I'll keep you posted.

Sure, no worries. :)

nokyan avatar Mar 05 '25 17:03 nokyan

Do you mind checking it out?

Hi, seems to work !

Image

The jumps are apparently due to Tigerlake not being properly supported by the xe driver. So maybe someone with a discrete graphics card can double check, but it looks to me like it's working properly.

Thanks !

cvlc12 avatar Mar 20 '25 16:03 cvlc12

Do you mind checking it out?

Hi, seems to work !

Image

The jumps are apparently due to Tigerlake not being properly supported by the xe driver. So maybe someone with a discrete graphics card can double check, but it looks to me like it's working properly.

Thanks !

Great to see! You're correct though, we should wait until someone whose GPU properly works with this driver can test it too. Because of that, I don't think this will be included in Resources 1.8, I'm afraid.

nokyan avatar Mar 20 '25 16:03 nokyan

I have an Intel Arc B 580 using the Xe driver, compiling the xe-support branch works with that GPU where previously there was no activity reported

Image

ihayhurst avatar Mar 29 '25 23:03 ihayhurst

I have not compiled the xe branch and I can wait, but I think kernel 6.15 will be adding some temperature support for the xe driver. See: https://www.phoronix.com/news/Intel-Xe-SVM-For-Linux-6.15

I noticed above and with my machine (with kernel 6.14) the B580 shows PCIe 1.0 x1 link speed. Breaking this down with lspci -tv shows the tree for the card:

+-03.1-[2d-30]----00.0-[2e-30]--+-01.0-[2f]----00.0 Intel Corporation Device e20b | \-02.0-[30]----00.0 Intel Corporation Device e2f7

The card's bridge at the head is "2d:00.0 PCI bridge: Intel Corporation Device e2ff (rev 01)" and shows a link speed of

LnkSta: Speed 16GT/s, Width x8

which is the correct link speed for the card - PCIe4 x8. The video device e20b which Resources enumerates is PCIe1 x1 as it's link cap. and link speed is in lspci matches with LnkSta: Speed 2.5GT/s, Width x1

It might be necessary to enumerate the speed back up the chain to the parent bridge/

I noticed LACT currently can show VRAM used/total, power usage and clock info.

Also add a line if ReBAR is enabled.

Roquelobster avatar Mar 30 '25 01:03 Roquelobster

Weirdly resources xe-support build wasnt showning more than 59% use when it was working quite hard, so I compared it to nvtop on the same load, which was showing 100% use

Image

ihayhurst avatar Mar 30 '25 15:03 ihayhurst

I noticed above and with my machine (with kernel 6.14) the B580 shows PCIe 1.0 x1 link speed. Breaking this down with lspci -tv shows the tree for the card:

+-03.1-[2d-30]----00.0-[2e-30]--+-01.0-[2f]----00.0 Intel Corporation Device e20b | \-02.0-[30]----00.0 Intel Corporation Device e2f7

The card's bridge at the head is "2d:00.0 PCI bridge: Intel Corporation Device e2ff (rev 01)" and shows a link speed of

LnkSta: Speed 16GT/s, Width x8

which is the correct link speed for the card - PCIe4 x8. The video device e20b which Resources enumerates is PCIe1 x1 as it's link cap. and link speed is in lspci matches with LnkSta: Speed 2.5GT/s, Width x1

It might be necessary to enumerate the speed back up the chain to the parent bridge

Hi @Roquelobster so I worked on the PCIE link speed. Could you check if the displayed link speed jumps to PCIe 4 on significant GPU load? Cause it could be a power save cause. My NVidia clocks down dynamically as well.

peterdk avatar Apr 03 '25 06:04 peterdk

Similar issues here on my Intel Arc B580: sudo lshw -c video | grep 'configuration' configuration: depth=32 driver=xe latency=0 resolution=1920,1080

Image

Image

I have not tried the XE branch yet

Ressk avatar Jun 07 '25 01:06 Ressk

Image

looks like the XE branch can see... a small amount of the data at least

It also appears to be capped at 50% when displaying data

Ressk avatar Jun 07 '25 01:06 Ressk

Hi, I've pushed to xe-support again. It's mostly refactoring, but I moved the branch to the newest main commit, so I can actually start working on it again. Could everyone who has a new-ish Intel GPU that works properly with xe pull that branch and test it? It'd be great if you could include a screenshot of nvtop (or something similar) aswell.

nokyan avatar Jul 07 '25 17:07 nokyan

hi @nokyan as requested testing 1.8.0-xe-support/963fe68 wasn't a thorough test but GPU still showing 50% vs nvtop at 100 % (used FluidX3d as a test gpu load)

Image

ihayhurst avatar Jul 08 '25 07:07 ihayhurst

hi @nokyan as requested testing 1.8.0-xe-support/963fe68 wasn't a thorough test but GPU still showing 50% vs nvtop at 100 % (used FluidX3d as a test gpu load) Image

Thank you! Could you do further testing with something that doesn't use 100% of the GPU like a 3D game and check whether Resources always shows about half of the actual usage as shown in nvtop?

nokyan avatar Jul 08 '25 20:07 nokyan

hi @nokyan as requested testing 1.8.0-xe-support/963fe68 wasn't a thorough test but GPU still showing 50% vs nvtop at 100 % (used FluidX3d as a test gpu load)

Thank you! Could you do further testing with something that doesn't use 100% of the GPU like a 3D game and check whether Resources always shows about half of the actual usage as shown in nvtop?

Games? not sure I have any! I gave Superposition a go but that is a benchmark so went full tilt too... , hmm here's opening Darktable and flicking through some images gets about 25% max

Image

ihayhurst avatar Jul 08 '25 21:07 ihayhurst

I've pushed a new commit to xe-support, could someone test it?

nokyan avatar Jul 09 '25 12:07 nokyan

AI inference running in the background https://github.com/user-attachments/assets/7ea49675-aba6-470c-a4bb-a217bf1245c7

adrianboguszewski avatar Jul 10 '25 11:07 adrianboguszewski

AI inference running in the background https://github.com/user-attachments/assets/7ea49675-aba6-470c-a4bb-a217bf1245c7

Thank you! That looks good as far as I can see. A bit of deviance is to be expected as you can't get nvtop and Resources to sample usage at the exact same times. Can you test whether Resources picks up a 3D workload like a game as well?

nokyan avatar Jul 10 '25 11:07 nokyan

Here you are

https://github.com/user-attachments/assets/c609fbd1-06b7-437d-87db-a1f9a07662ab

adrianboguszewski avatar Jul 10 '25 11:07 adrianboguszewski

Here you are Screencast.From.2025-07-10.13-46-26.mp4

Thanks again, this looks pretty good as well. I'm gonna see if I can find any more stats exposed by xe and get it merged then. If in the meantime other people could test it, that'd be great. :)

nokyan avatar Jul 10 '25 18:07 nokyan

good for me this time too :)

Image

ihayhurst avatar Jul 10 '25 20:07 ihayhurst

According to Phoronix one of the Xe updates for kernel 6.17 will have "- Fan control and voltage information is now exposed via sysfs for the Xe driver. "

I'll add there is a new Xe firmware binary in the Linux firmware repo just for fan control. It might be used in 6.17.

See: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/xe

Roquelobster avatar Jul 11 '25 05:07 Roquelobster

Hi everyone, can someone test the latest commit in xe-support again? I added support for frequency, power and temperature readings.

nokyan avatar Jul 15 '25 14:07 nokyan

Unfortunately, I cannot see any difference:

Image Image

adrianboguszewski avatar Jul 16 '25 09:07 adrianboguszewski

Same !

But this is Tigerlake iGPU so not sure I should expect anything

Image

cvlc12 avatar Jul 16 '25 09:07 cvlc12

Thanks for testing, given that nvtop doesn't say that for you both either, I suspect that iGPUs don't expose most of that data, though GPU frequency should work. Can you run this Resources debug build with env var RUST_LOG=resources=trace set and look for cur_freq in those logs?

nokyan avatar Jul 16 '25 14:07 nokyan

@nokyan this should be a good reference: https://github.com/ulissesf/qmassa as implemented by Intel Xe driver people

adrianboguszewski avatar Jul 16 '25 15:07 adrianboguszewski