amdgpu-clocks icon indicating copy to clipboard operation
amdgpu-clocks copied to clipboard

card0 isn't a valid path, sometimes GPU is mounted as card1

Open Vixtron opened this issue 2 years ago • 7 comments

image

Expected behavior: GPU clocks are set by writing to card0 by /etc/default/amdgpu-custom-states.card0

Actual behavior:

Nov 01 14:34:27 WS.local amdgpu-clocks[13908]: ls: cannot access '/sys/class/drm/card0/device/hwmon': No such file or directory
Nov 01 14:34:27 WS.local amdgpu-clocks[13902]: WARNING: /sys/class/drm/card0/device/pp_od_clk_voltage does not exist, skipping!

The solution could be to use the actual PCI path instead of a dynamic path.

Vixtron avatar Nov 01 '22 13:11 Vixtron

hi @Vixtron

Thanks for reporting, but this particular issue has nothing to do with this very project, the amdgpu-clocks is not assigning cardX numbers by itself, it is Linux kernel and its driver modules that does that. Speaking of which, what cards do you have in your system, and what is your kernel and drivers are you using for those cards? Do you perhaps use an external Thunderbolt GPU enclosure, or some kind of notebook with a combination of iGPU & dGPU or similar?

As a potential workaround; verify which of your multiple cards are you changing clocks for, and double check that it is the same card that is consistently toggling between card0 and card1, and then just symlink /etc/default/amdgpu-custom-state.card0 to /etc/default/amdgpu-custom-state.card1. That would ensure that same settings would be applied to your card, regardless if kernel sees it as card0 or card1.

sibradzic avatar Nov 02 '22 05:11 sibradzic

hi @Vixtron

Thanks for reporting, but this particular issue has nothing to do with this very project, the amdgpu-clocks is not assigning cardX numbers by itself, it is Linux kernel and its driver modules that does that. Speaking of which, what cards do you have in your system, and what is your kernel and drivers are you using for those cards? Do you perhaps use an external Thunderbolt GPU enclosure, or some kind of notebook with a combination of iGPU & dGPU or similar?

As a potential workaround; verify which of your multiple cards are you changing clocks for, and double check that it is the same card that is consistently toggling between card0 and card1, and then just symlink /etc/default/amdgpu-custom-state.card0 to /etc/default/amdgpu-custom-state.card1. That would ensure that same settings would be applied to your card, regardless if kernel sees it as card0 or card1.

I only have 1 dedicated card - RX580 and I'm using the amdgpu driver, since I updated to kernel 6.0.5 I noticed after rebooting that my card was mounted as card1 and my clocks were not being applied.

Vixtron avatar Nov 02 '22 09:11 Vixtron

I only have 1 dedicated card - RX580

Your screenshot suggest otherwise. What does the ls -alh /sys/class/drm say? And lspci?

and I'm using the amdgpu driver

Yes, of course, but which amdgpu driver? Mainline kernel, distro specific, pro, something else? What does modinfo amdgpu say?

Tried the workaround?

sibradzic avatar Nov 02 '22 11:11 sibradzic

I only have 1 dedicated card - RX580

Your screenshot suggest otherwise. What does the ls -alh /sys/class/drm say? And lspci?

and I'm using the amdgpu driver

Yes, of course, but which amdgpu driver? Mainline kernel, distro specific, pro, something else? What does modinfo amdgpu say?

Tried the workaround?

image

lspci output:

image

I'm running the open source kernel amdgpu driver of course. /lib/modules/6.0.5-200.fc36.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz

image

Now you can see my GPU is mounted as card0 after I rebooted the pc, next time I reboot it will be card1 for some reason, no I haven't tried a workaround, but someone told me to try symlinking the card1 to card0 in case it mounts itself wrong but I don't see that as a good idea - as for the PCI path I would assume it would be the same issue.

Vixtron avatar Nov 02 '22 12:11 Vixtron

next time I reboot it will be card1 for some reason

When that happens, what are the ls -alh /sys/class/drm and lspci saying?

someone told me to try symlinking the card1 to card0 in case it mounts itself wrong but I don't see that as a good idea

Someone told you what exactly? What about the potential workaround I told you about? So far it is the only idea that can help your case, please try that.

sibradzic avatar Nov 03 '22 03:11 sibradzic

next time I reboot it will be card1 for some reason

When that happens, what are the ls -alh /sys/class/drm and lspci saying?

someone told me to try symlinking the card1 to card0 in case it mounts itself wrong but I don't see that as a good idea

Someone told you what exactly? What about the potential workaround I told you about? So far it is the only idea that can help your case, please try that.

Today I rebooted and it shows this image image

I don't think your symlink solution will work, because the symlink will be overridden by the card0 or card1 each time the PC reboots, maybe if I could symlink directories card1 -> card0 and card0 -> card1 it would work and I don't know if that is possible.

Vixtron avatar Nov 03 '22 17:11 Vixtron

I don't think your symlink solution will work, because the symlink will be overridden by the card0 or card1 each time the PC reboots, maybe if I could symlink directories card1 -> card0 and card0 -> card1 it would work and I don't know if that is possible.

If you bother to read it properly you'll come to understanding that I ain't suggesting symlinking any /sys/class/drm directories at all, that wouldn't make any sense...

What I am suggesting is to make a symlink (or just plain good old copy) of an amdgpu-custom-state file, so that amdgpu-clocks would try to apply identical custom settings to both card0 and card1, every time it runs. Obviously, that would work for just one card, depending on which identifier is currently assigned to a card by the driver (it would just throw an error about the other, missing, card identifier), but it should give you the result you want.

sibradzic avatar Nov 03 '22 20:11 sibradzic

Using a symlink works. I have an amd card plus the intel iGPU. The card numbers seem to be random each boot. I went into /etc/default and did ln -s amdgpu-custom-states.card0 amdgpu-custom-states.card1 and the settings get applied no matter which card number gets assigned. The downside is that it will try to apply the settings to the intel card, fail, and then apply them to the amd card.

walmartshopper avatar Dec 16 '22 18:12 walmartshopper