unraid_kernel icon indicating copy to clipboard operation
unraid_kernel copied to clipboard

ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch

Open IamMurphy opened this issue 10 months ago • 18 comments

Not sure if you can implement the fix into one of the kernels. Trying to use the QNAP QM2-2P10G1TB 10gbe Ethernet card and dual nvme card in my unraid server. The Ethernet works fine, the drives show up in bios but not in unraid

01:00.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) 02:00.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) 02:04.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) 02:08.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) 02:0c.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01)

https://lore.kernel.org/lkml/[email protected]/t/#mfa9766088a98f71b291fb910c1868214bdb218b2

There is supposedly a fix but I am unsure of how to implement it into a kernel. I think it may be included in the Linux kernel just not enabled by default.

I apologize if this is outside of your kernel build or if I misunderstood the fix.

IamMurphy avatar Apr 23 '24 13:04 IamMurphy

does dmesg show anything at boot? as I searched the pcie switch used in the card is problematic....

thor2002ro avatar Apr 24 '24 05:04 thor2002ro

IMG_0405 IMG_0403

IamMurphy avatar Apr 24 '24 06:04 IamMurphy

reading the patch the quirk fails to renegociate pcie 5gbps but should be stuck in 2.5 and should work..... might be bios related.... as far as I read the ASM2824 is really fiddly and also the quirk works only about 80% of the time.... its not a 100% fix....

thor2002ro avatar Apr 24 '24 15:04 thor2002ro

The drives show in bios and in windows without issue as well as with the qnap obviously. Is there any easy way to test if it will work ?

IamMurphy avatar Apr 24 '24 16:04 IamMurphy

I do think the card is working to some degree because I belive the 10gbps Ethernet is connected to the switch also... are the 2 nvmes showing up after the card on the motherboard? might be helpful to do a full dmesg dump after boot.... "dmesg > log.txt"

but I dont have much hope....without fiddling with it

you could try at boot pci=assign-busses

did you try the stock unraid kernel? might be a regression in the new kernels...

thats all I can think of....

thor2002ro avatar Apr 25 '24 05:04 thor2002ro

I do think the card is working to some degree because I belive the 10gbps Ethernet is connected to the switch also... are the 2 nvmes showing up after the card on the motherboard? might be helpful to do a full dmesg dump after boot.... "dmesg > log.txt"

but I dont have much hope....without fiddling with it

you could try at boot pci=assign-busses

did you try the stock unraid kernel? might be a regression in the new kernels...

thats all I can think of....

Yes after the NIC 2:04 and 2:08 for the ssds.

IamMurphy avatar Apr 25 '24 05:04 IamMurphy

MassNas Diagnostics 20240621.zip

Using the latest kernel here is the diagnostic data, still can't get the nvmes to show up in unraid but they show fine in bios

IamMurphy avatar Jun 22 '24 01:06 IamMurphy

looks fine except the pcie init of the switch ports.... and btw I did some searching and yes the ethernet is connected to the pcie switch also ... so it's not the switch.... its either a bad card in some way there's some hardware issue... things you can try....

  • reseat the card
  • remove the nvmes and boot without them see if any errors show up....
  • test the nvmes on a different pc make sure they are fine ....
  • insert just 1 nvme into the card and retest...
  • see if you have any pcie gen/speed settings in bios and lower that to its lowest...

this will require some trial and error testing... since there's nothing obvious in the logs... sry :(

thor2002ro avatar Jun 22 '24 13:06 thor2002ro

That is unfortunate to hear.

I have tried the card in the same pc running windows and all is recognized, its recognized in the bios and both cards show up. And was running in my QNAP TS-473a prior.

I also tried it in another pc and it worked flawlessly on windows and i updated the firmware of the SSDs to rule that out. Is there a way to set the driver for an individual line ? As both show using pcieport instead of nvme

[1b21:2824] 02:04.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01)

[1b21:2824] 02:08.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01)

I also currently trying

"kernel /bzimage append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" but that made no change unfortunately

IamMurphy avatar Jun 22 '24 13:06 IamMurphy

Is there a way to set the driver for an individual line ? As both show using pcieport instead of nvme

[1b21:2824] 02:04.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01)

[1b21:2824] 02:08.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01)

thats correct since those are the pcie slots not the nvmes.... the nvmes are 1 level down from those.... run "lspci -tvnn" in terminal of the server and you can see the full picture of the pci slots that are detected...

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off this is kinda pointless.....

nvme_core.default_ps_max_latency_us=0 disables the nvme power saving AFTER its detected.... pcie_aspm=off disables the pcie power saving states

I still get the feeling this is bios related....

you can try

  • pci=realloc
  • pci=assign-busses
  • pci=pcie_scan_all
  • pci=noacpi
  • pci=noearly

DO NOT DO THEM ALL AT ONCE.... 1 per boot :))))))))))))) I mean you could do all of them separated by "," but .... i don't know... :)

edit: 1 more thing you can try... enable CSM boot or aka legacy boot...

thor2002ro avatar Jun 22 '24 13:06 thor2002ro

~# lspci -tvnn -[0000:00]-+-00.0 Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers [8086:4641] +-01.0-[01-06]----00.0-[02-06]--+-00.0-[03]----00.0 Aquantia Corp. AQC113C NBase-T/IEEE 802.3an Ethernet Controller [Marvell Scalable mGig] [1d6a:14c0] | +-04.0-[04]-- | +-08.0-[05]-- | -0c.0-[06]-- +-02.0 Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics] [8086:46a6] +-06.0-[07]----00.0 Sandisk Corp WD Black SN850X NVMe SSD [15b7:5030] +-06.2-[08]----00.0 Sandisk Corp WD Black SN850X NVMe SSD [15b7:5030] +-08.0 Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] +-0a.0 Intel Corporation Platform Monitoring Technology [8086:467d] +-14.0 Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller [8086:51ed] +-14.2 Intel Corporation Alder Lake PCH Shared SRAM [8086:51ef] +-16.0 Intel Corporation Alder Lake PCH HECI Controller [8086:51e0] +-17.0 Intel Corporation Alder Lake-P SATA AHCI Controller [8086:51d3] +-1c.0-[09]----00.0 JMicron Technology Corp. JMB58x AHCI SATA controller [197b:0585] +-1f.0 Intel Corporation Alder Lake PCH eSPI Controller [8086:5182] +-1f.3 Intel Corporation Alder Lake PCH-P High Definition Audio Controller [8086:51c8] +-1f.4 Intel Corporation Alder Lake PCH-P SMBus Host Controller [8086:51a3] -1f.5 Intel Corporation Alder Lake-P PCH SPI Controller [8086:51a4]

I saw someone on the unraid forums mention the aspm and latency flag and said it worked for them so i tried it.

Unfortunately if it is bios related there is no updates. Its an Erying 12700h itx board. But I don't understand why the bios recognizes the drives and it works in windows if its bios related

https://www.aliexpress.com/item/1005005373794774.html

IamMurphy avatar Jun 22 '24 14:06 IamMurphy

looks fine except the pcie init of the switch ports.... and btw I did some searching and yes the ethernet is connected to the pcie switch also ... so it's not the switch.... its either a bad card in some way there's some hardware issue... things you can try....

  • reseat the card
  • remove the nvmes and boot without them see if any errors show up....
  • test the nvmes on a different pc make sure they are fine ....
  • insert just 1 nvme into the card and retest...
  • see if you have any pcie gen/speed settings in bios and lower that to its lowest...

this will require some trial and error testing... since there's nothing obvious in the logs... sry :(

Is the patch enabled in your kernel ( by default its disabled in the Linux kernel ). Someone from limetech messaged me and said try the 7.0 beta when released as the patch should be in their 6.8 kernel ( its not currently )

IamMurphy avatar Jun 22 '24 18:06 IamMurphy

you know you can look at the source in this git right ? where is it disabled ? "broken device, retraining non-functional downstream link at 2.5GT/s\n" is from the patch https://github.com/thor2002ro/unraid_kernel/commit/a89c82249c3763780522f763dd2e615e2ea114de

thor2002ro avatar Jun 22 '24 19:06 thor2002ro

you know you can look at the source in this git right ? where is it disabled ? "broken device, retraining non-functional downstream link at 2.5GT/s\n" is from the patch a89c822

I apologize I am not super familiar with the Linux kernel and reading it.

I was told it’s disabled by default as it can cause other issues and wasn’t sure.

IamMurphy avatar Jun 22 '24 23:06 IamMurphy

Is there a way to set the driver for an individual line ? As both show using pcieport instead of nvme [1b21:2824] 02:04.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) [1b21:2824] 02:08.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01)

thats correct since those are the pcie slots not the nvmes.... the nvmes are 1 level down from those.... run "lspci -tvnn" in terminal of the server and you can see the full picture of the pci slots that are detected...

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off this is kinda pointless.....

nvme_core.default_ps_max_latency_us=0 disables the nvme power saving AFTER its detected.... pcie_aspm=off disables the pcie power saving states

I still get the feeling this is bios related....

you can try

  • pci=realloc
  • pci=assign-busses
  • pci=pcie_scan_all
  • pci=noacpi
  • pci=noearly

DO NOT DO THEM ALL AT ONCE.... 1 per boot :))))))))))))) I mean you could do all of them separated by "," but .... i don't know... :)

edit: 1 more thing you can try... enable CSM boot or aka legacy boot...

So I ran through all those commands and nothing changed. I decided to swap ssds again ( tried previously when I posted the first time and no change ) and the western digital SN770’s are detected ……..

[1b21:2824] 02:00.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) [1d6a:14c0] 03:00.0 Ethernet controller: Aquantia Corp. AQC113C NBase-T/IEEE 802.3an Ethernet Controller [Marvell Scalable mGig] (rev 03) [1b21:2824] 02:04.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) [15b7:5017] 04:00.0 Non-Volatile memory controller: Sandisk Corp WD Black SN770 / PC SN740 256GB / PC SN560 (DRAM-less) NVMe SSD (rev 01) [N:0:0:1] disk WD_BLACK SN770 1TB__1 /dev/nvme0n1 1.00TB [1b21:2824] 02:08.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01) [15b7:5017] 05:00.0 Non-Volatile memory controller: Sandisk Corp WD Black SN770 / PC SN740 256GB / PC SN560 (DRAM-less) NVMe SSD (rev 01) [N:1:0:1] disk WD_BLACK SN770 1TB__1 /dev/nvme1n1 1.00TB [1b21:2824] 02:0c.0 PCI bridge: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch (rev 01)

I mean success its working but I wanted to use the solidigm p41’s as they are larger 2tb each and they were recognized by qnap and windows in the qm2 card.

IamMurphy avatar Jun 23 '24 01:06 IamMurphy

so wait.... let me get this straight... with the different type of ssd it works? this is getting weird if so... means the might be something wrong with the ssd's firmware did you look for an fw update?

comparing it to windows... just lets you know its working.... at some point... doesn't mean it will work for sure :))) its complicated.... bios has portions that only work on windows and are not os agnostic.... and it affects the pcie discovery

would be interesting to see if the card with the ssds works on another system that's not custom chinese laptop refurb :) that bios could have lots of issues....

thor2002ro avatar Jun 23 '24 17:06 thor2002ro

so wait.... let me get this straight... with the different type of ssd it works? this is getting weird if so... means the might be something wrong with the ssd's firmware did you look for an fw update?

comparing it to windows... just lets you know its working.... at some point... doesn't mean it will work for sure :))) its complicated.... bios has portions that only work on windows and are not os agnostic.... and it affects the pcie discovery

would be interesting to see if the card with the ssds works on another system that's not custom chinese laptop refurb :) that bios could have lots of issues....

Yes with a different ssd it worked. I tried this back when originally posted but it didn't work. The newest kernel it now works.

I updated the firmware on both ssds on a different windows system originally as well without issue. So its a controller issue with the Soldigm P41 plus?

IamMurphy avatar Jun 24 '24 17:06 IamMurphy

could be a lot of factors..... but mostly it depends how the nvmes negotiate pcie speeds since the switch has problems with pcie speeds and jiggles them around the controller could just give up...

thor2002ro avatar Jun 24 '24 18:06 thor2002ro