AQC113 card unstable under Linux 6.14.5
Sometimes I would boot up my computer only to find that the PCIe card I am using, which to be exact is a TP-Link TX401, is spamming PCIe AER errors to my kernel log. Those logs are innocuous but the spamming will eventually wear down my SSD, so I have to reboot twice or thrice in order for the driver to finally correctly handle the card.
The spam:
kernel: pcieport 0000:00:03.2: AER: Correctable error message received from 0000:11:00.0
kernel: atlantic 0000:11:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
kernel: atlantic 0000:11:00.0: device [1d6a:04c0] error status/mask=00000001/0000e000
kernel: atlantic 0000:11:00.0: [ 0] RxErr (First)
The card in lspci:
11:00.0 Ethernet controller: Aquantia Corp. AQtion AQC113 NBase-T/IEEE 802.3an Ethernet Controller [Antigua 10G] (rev 03)
Subsystem: Aquantia Corp. Device 0001
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 144
IOMMU group: 3
Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=512K]
Region 2: Memory at fc0a0000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at fbc00000 (64-bit, non-prefetchable) [size=4M]
Expansion ROM at fc080000 [disabled] [size=128K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75W TEE-IO-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x4
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: Upstream Port
Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00000200
Capabilities: [d0] Vital Product Data
Product Name: Marvell AQtion Network Adapter
Read-only fields:
[PN] Part number: 00B1E113
[V0] Vendor specific: MAC Addr: <Redacted>
[V1] Vendor specific: Bundle Version: 1.5.38
[V2] Vendor specific: Fw Version: 1.2.122
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr+ HeaderOF+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed+ WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [168 v1] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [178 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [198 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [1bc v1] Lane Margining at the Receiver
PortCap: Uses Driver-
PortSta: MargReady+ MargSoftReady-
Capabilities: [1d4 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Capabilities: [1dc v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=14us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=32768ns
L1SubCtl2: T_PwrOn=14us
Capabilities: [1ec v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2ec v1] Data Link Feature <?>
Capabilities: [2f8 v1] Precision Time Measurement
PTMCap: Requester+ Responder- Root-
PTMClockGranularity: Unimplemented
PTMControl: Enabled- RootSelected-
PTMEffectiveGranularity: Unknown
Capabilities: [304 v1] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
Kernel driver in use: atlantic
Kernel modules: atlantic
There are reports of people having issues with AQC113C though I don't think mine is that model of the chip - so I find myself pretty lonely in this. The card was bought new and it operates fine otherwise.
I'm also running 6.14.5 (just upgraded to 6.14.6) and have a TP-Link TX401. I haven't seen any issues so far.
I'm also running 6.14.5 (just upgraded to 6.14.6) and have a TP-Link TX401. I haven't seen any issues so far.
You need to have PCIe AER Capability enabled in BIOS to see those messages, and of course if you ever saw those messages that'd mean that the card would've worked anyway.
Further research I put into this, seems that this new Aquantia chip is less supported and sometimes the driver initializes it slightly wrong. A reboot fixes it, that's what I've been doing lately.
But it only seldom happens.
Got it. I'm pretty sure I don't have PCIe AER enabled in my BIOS since I don't recall seeing that option after going over the various options a few times. I was just providing some evidence that it works in case that was a data point, but sounds like you already know when it can and cannot work.
Further research I put into this, seems that this new Aquantia chip is less supported and sometimes the driver initializes it slightly wrong. A reboot fixes it, that's what I've been doing lately.
I've also noticed that driver development for this chip is not very active. This repo hasn't been updated in many years and the latest Linux kernel source is still quite similar other than small compatibility fixes over time.
On my own grips, WOL is not working, while it's working great on Windows (covered in another issue here).