vendor-reset icon indicating copy to clipboard operation
vendor-reset copied to clipboard

Audio Reset AGAIN?

Open methanoid opened this issue 5 years ago • 10 comments

In master as of 24c9fc547366e13a96ad497b038d1c378423a39c

Originally posted by @ajmadsen in https://github.com/gnif/vendor-reset/issues/16#issuecomment-766155044

If Master now includes audio reset, well I have a problem with my Powercolor RX5700 Red Dragon. The audio is stopping resets on unRAID 6.9.2 with a docker that builds the kernel with reset patch for me.

2021-04-15T18:13:00.747206Z qemu-system-x86_64: vfio: Cannot reset device 0000:10:00.1, no available reset mechanism.

Does same with Windows & Linux, hair being removed at rapid rate! :-( On fresh boot of server, Linux or Windows VM works. Reboot the VM OR shutdown/restart from VM manager and the VM won't restart

methanoid avatar Apr 15 '21 18:04 methanoid

Can you post the output of dmesg/kernel log? I'm looking for whether the module actually loads, and what logs it makes when doing the device reset.

ajmadsen avatar Apr 15 '21 18:04 ajmadsen

I sent that via email but been trying many things since (passing many BIOS rom images and even none).

VM log always seems to suggest its audio reset but it might not be

Both VM log and Dmesg log attached

Vmlog.txt.txt log.txt

I'm really stumped. I've tried messing with IOMMU separation (ACS patches) and not passing or NOT passing the SPP device that is in same group as my USB controllers.

Other PCI Devices: AMD Starship/Matisse Reserved SPP | Non-Essential Instrumentation (0a:00.0) AMD Matisse USB 3.0 Host Controller | USB controller (0a:00.1) AMD Matisse USB 3.0 Host Controller | USB controller (0a:00.3) Phison Electronics E12 NVMe Controller | Non-Volatile memory controller (0d:00.0)

Back to testing Roms I guess

methanoid avatar Apr 19 '21 08:04 methanoid

Here's my IOMMU when I DONT use any ACS patches - I see at bottom both GPUS (Nvidia in one group, AMD in two) but BOTH audio sections dont show FLR enabled and at top the pair of USB controllers we normally have to pass (allegedly other one doesnt like being passed) one of which doesnt show FLR enabled. I wonder if the BIOS is borked? I'm clutching at straws jpo

methanoid avatar Apr 19 '21 09:04 methanoid

You shouldn't need to pass a ROM if you've got multiple GPUs, unless it's for the boot GPU. I've seen the wrong ROM cause the errors you've seen in your logs. Unfortunately, other than that there's not much I can say beyond the reset isn't working for your device, as evidenced by the IOMMU timeouts. The log messages about no available reset for the audio device are irrelevant, as they will always show regardless of whether this project does anything special to preserve the audio device during a reset.

ajmadsen avatar Apr 19 '21 12:04 ajmadsen

Its the primary GPU and its my intention to use both GPUs for VMs, with unRAID headless. i guess I should dump my own rom and not use one from Techpowerup. But if I do that and still have this issue... what can i do then?

methanoid avatar Apr 19 '21 13:04 methanoid

I dumped my own rom... same problem (a reset of the VM hangs the VM)

[  117.543658] ATOM BIOS: 111 [  117.543660] vendor-reset-drm: atomfirmware: bios_scratch_reg_offset initialized to 4c [  117.801986] vfio-pci 0000:10:00.0: AMD_NAVI10: bus reset disabled? yes [  117.801998] vfio-pci 0000:10:00.0: AMD_NAVI10: SMU response reg: 0, sol reg: 0, mp1 intr enabled? no, bl ready? yes [  117.802001] vfio-pci 0000:10:00.0: AMD_NAVI10: performing post-reset [  117.825997] vfio-pci 0000:10:00.0: AMD_NAVI10: reset result = 0 [  156.781310] vfio-pci 0000:10:00.0: AMD_NAVI10: version 1.1 [  156.781312] vfio-pci 0000:10:00.0: AMD_NAVI10: performing pre-reset [  156.781447] vfio-pci 0000:10:00.0: AMD_NAVI10: performing reset [  156.783583] vfio-pci 0000:10:00.0: No more image in the PCI ROM [  156.783600] ATOM BIOS: 111 [  156.783600] vendor-reset-drm: atomfirmware: bios_scratch_reg_offset initialized to 4c [  156.783602] vfio-pci 0000:10:00.0: AMD_NAVI10: bus reset disabled? yes [  156.783606] vfio-pci 0000:10:00.0: AMD_NAVI10: SMU response reg: 1, sol reg: 7edbeca, mp1 intr enabled? yes, bl ready? yes [  156.783607] vfio-pci 0000:10:00.0: AMD_NAVI10: Clearing scratch regs 6 and 7 [  156.783706] vfio-pci 0000:10:00.0: AMD_NAVI10: begin psp mode 1 reset [  157.290612] vfio-pci 0000:10:00.0: AMD_NAVI10: mode1 reset succeeded [  159.058426] vfio-pci 0000:10:00.0: AMD_NAVI10: PSP mode1 reset successful [  159.058436] vfio-pci 0000:10:00.0: AMD_NAVI10: performing post-reset [  159.082651] vfio-pci 0000:10:00.0: AMD_NAVI10: reset result = 0 [  159.995477] AMD-Vi: Completion-Wait loop timed out [  160.121905] AMD-Vi: Completion-Wait loop timed out [  160.334522] AMD-Vi: Completion-Wait loop timed out [  160.459346] AMD-Vi: Completion-Wait loop timed out [  160.583967] AMD-Vi: Completion-Wait loop timed out [  160.713754] AMD-Vi: Completion-Wait loop timed out [  160.840491] AMD-Vi: Completion-Wait loop timed out [  160.871402] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0e70] [  160.966770] AMD-Vi: Completion-Wait loop timed out [  161.115516] AMD-Vi: Completion-Wait loop timed out [  161.240033] AMD-Vi: Completion-Wait loop timed out [  161.365943] AMD-Vi: Completion-Wait loop timed out [  161.490578] AMD-Vi: Completion-Wait loop timed out [  161.871403] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0ea0] [  162.234775] AMD-Vi: Completion-Wait loop timed out [  162.415268] AMD-Vi: Completion-Wait loop timed out [  162.631266] AMD-Vi: Completion-Wait loop timed out [  162.871411] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0ed0] [  163.871416] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0f00] [  164.871425] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0f30] [  164.916804] AMD-Vi: Completion-Wait loop timed out [  165.097532] AMD-Vi: Completion-Wait loop timed out [  165.871417] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0f60] [  166.871424] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0f90] [  167.871429] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e0fc0] [  168.871439] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e1010] [  169.598502] AMD-Vi: Completion-Wait loop timed out [  169.871465] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e1040] [  169.917686] AMD-Vi: Completion-Wait loop timed out [  170.043257] AMD-Vi: Completion-Wait loop timed out [  170.213564] AMD-Vi: Completion-Wait loop timed out [  170.373198] AMD-Vi: Completion-Wait loop timed out [  170.871473] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=10:00.0 address=0x1001e1070] root@unRAID:

Its timing out on GPU...

methanoid avatar Apr 19 '21 13:04 methanoid

I have the same issue even with the 5700XT as secondary GPU. Log looks about the same.

Chillidaddy avatar Aug 15 '21 21:08 Chillidaddy

Experiencing the same issue here with my RX580 on Windows VMs. Problem isn't an issue on macOS. Tried on kernels 5.10-MANJARO and 5.13-MANJARO and both experienced the same problems. Log for the virtual machine spams this before I forcefully shut it down:

2021-08-23T17:30:48.094122Z qemu-system-x86_64: vfio: Cannot reset device 0000:01:00.1, no available reset mechanism.

The vfio-pci driver will spew this out in a loop when running dmesg -w. No sign of the vendor-reset driver in these logs:

[  295.440717] vfio-pci 0000:01:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x2c8c
[  295.440720] vfio-pci 0000:01:00.0: AMD_POLARIS10: performing post-reset
[  295.476979] vfio-pci 0000:01:00.0: AMD_POLARIS10: reset result = 0
[  296.750451] vfio-pci 0000:01:00.0: AMD_POLARIS10: version 1.1
[  296.750456] vfio-pci 0000:01:00.0: AMD_POLARIS10: performing pre-reset
[  296.750699] vfio-pci 0000:01:00.0: AMD_POLARIS10: performing reset

dev-sda1 avatar Aug 23 '21 17:08 dev-sda1

Update: was fixed by no longer passing through the audio for my GPU, just video only. Of course only a temporary fix until an update is (hopefully) pushed into main. Credit to Corrgan in the VFIO discord for this

dev-sda1 avatar Aug 24 '21 17:08 dev-sda1

@dev-sda1 how did you pass only the GPU? It is in separate IOMMU group?

Sunderland93 avatar Dec 08 '21 17:12 Sunderland93