mdio-tools icon indicating copy to clipboard operation
mdio-tools copied to clipboard

Scanning PHYs that only support c45 doesn't work

Open abajk opened this issue 4 months ago • 10 comments

I have an SFP module with a built-in Aquantiq AQR113C PHY. It is accessed via the rollball protocol. The mdio application has a problem accessing such PHYs:

...
[   29.762375] mtk_soc_eth 15100000.ethernet sfp-lan: PHY [i2c:sfp2:11] driver [Aquantia AQR113C] (irq=POLL)
[   29.924656] mtk_soc_eth 15100000.ethernet sfp-wan: configuring for inband/10gbase-r link mode
[   29.957040] br-wan: port 2(sfp-wan) entered blocking state
[   29.962547] br-wan: port 2(sfp-wan) entered disabled state
[   29.968045] mtk_soc_eth 15100000.ethernet sfp-wan: entered allmulticast mode
[   29.975199] mtk_soc_eth 15100000.ethernet sfp-wan: entered promiscuous mode
root@OpenWrt:~# mdio
fixed-0
i2c:sfp2
mdio-bus
mt7530-0
root@OpenWrt:~# mdio i2c:sfp2
ERROR: Unable to read status (-95)

abajk avatar Aug 31 '25 18:08 abajk

In contrast, another SFP module with c22 PHY accessed via i2c address 0x56 works fine:

...
[  107.671312] sfp sfp2: module FS               SFP-GB-GE-T      rev F    sn F2032210361      dc 210303  
[  107.777196] mtk_soc_eth 15100000.ethernet sfp-lan: switched to inband/sgmii link mode
[  107.934790] mtk_soc_eth 15100000.ethernet sfp-lan: PHY [i2c:sfp2:16] driver [Marvell 88E1111] (irq=POLL)
[  111.115815] mtk_soc_eth 15100000.ethernet sfp-lan: Link is Up - 1Gbps/Full - flow control rx/tx
[  111.115843] br-lan: port 4(sfp-lan) entered blocking state
[  111.129987] br-lan: port 4(sfp-lan) entered forwarding state

root@OpenWrt:~# 
root@OpenWrt:~# mdio
fixed-0
i2c:sfp2
mdio-bus
mt7530-0
root@OpenWrt:~# mdio i2c:sfp2
 DEV      PHY-ID  LINK
0x16  0x01410cc2  up

abajk avatar Aug 31 '25 18:08 abajk

Is there anything that can be done? :)

abajk avatar Aug 31 '25 18:08 abajk

I doubt that it has anything to do with the fact that it is C45-over-Snowball-over-I2C. mdio-netlink just defers to the kernel drivers to sort that out.

Today the bus_status() logic, i.e. the code that runs on mdio <BUS>, assumes a C22 bus: https://github.com/wkz/mdio-tools/blob/cd8a90801974afc64eabea664f15095b87dc289c/src/mdio/bus.c#L8-L53

In other words, there is no C45 probing support on any bus, but this is certainly something we could (should!) add.

If you know the address of the Aquantia PHY, you should be able to access it with mdio i2c:sfp2 mmd <port>:<dev> - even though the bus probing is not in place.

wkz avatar Sep 01 '25 11:09 wkz

In other words, there is no C45 probing support on any bus, but this is certainly something we could (should!) add.

Support for c45 would be very useful for debugging SFP modules. Do you have any plans to add this functionality?

If you know the address of the Aquantia PHY, you should be able to access it with mdio i2c:sfp2 mmd <port>:<dev> - even though the bus probing is not in place.

Reading all registers also doesn't work. However, reading individual registers works.

root@OpenWrt:~# mdio i2c:sfp2 mmd 17:30
ERROR: Unable to read status (-110)
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:30 raw 0x2
0x31c3
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:30 raw 0x3
0x1c13

abajk avatar Sep 01 '25 20:09 abajk

In other words, there is no C45 probing support on any bus, but this is certainly something we could (should!) add.

Support for c45 would be very useful for debugging SFP modules. Do you have any plans to add this functionality?

Hand-on-heart: probably not until I find myself needing it 😄

I'd be happy to accept a PR that adds it though.

If you know the address of the Aquantia PHY, you should be able to access it with mdio i2c:sfp2 mmd <port>:<dev> - even though the bus probing is not in place.

Reading all registers also doesn't work. However, reading individual registers works.

root@OpenWrt:~# mdio i2c:sfp2 mmd 17:30
ERROR: Unable to read status (-110)

You're getting -ETIMEDOUT. An mdio-netlink program will run with a default timeout of 100ms, which should be more than enough time to read the 16 registers that your command should trigger.

Is this an unusually slow bus? Bit-banged I2C?

Do you get the same error with mdio i2c:sfp2 mmd 17:30 dump 0+15, what about mdio i2c:sfp2 mmd 17:30 dump 0+7 , mdio i2c:sfp2 mmd 17:30 dump 0+3 etc?

wkz avatar Sep 01 '25 21:09 wkz

Hand-on-heart: probably not until I find myself needing it 😄

I'd be happy to accept a PR that adds it though.

Then I'll try to write the missing pieces of code :D

Is this an unusually slow bus? Bit-banged I2C?

The I2C controller is in hardware. My board is a Banana Pi R4 with an MT7988A SoC. There is also an I2C multiplexer between the SFP cage and the SoC.

This is probably an unrelated issue, but I also have an SFP+ module with RTL8261N/RTL8261BE that doesn't work. I noticed that it needs additional delays between commands. This issue is on the kernel side.

Do you get the same error with mdio i2c:sfp2 mmd 17:30 dump 0+15, what about mdio i2c:sfp2 mmd 17:30 dump 0+7 , mdio i2c:sfp2 mmd 17:30 dump 0+3 etc?

The register group reading looks OK:

root@OpenWrt:~# mdio i2c:sfp2 mmd 17:3 dump 0+15
0x0000: 0x2040
0x0001: 0x0002
0x0002: 0x31c3
0x0003: 0x1c13
0x0004: 0x00c1
0x0005: 0x009a
0x0006: 0xe000
0x0007: 0x0003
0x0008: 0xb009
0x0009: 0x0000
0x000a: 0x0000
0x000b: 0x0000
0x000c: 0x0000
0x000d: 0x0000
0x000e: 0x31c3
0x000f: 0x1c13
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:3 dump 0+7
0x0000: 0x2040
0x0001: 0x0002
0x0002: 0x31c3
0x0003: 0x1c13
0x0004: 0x00c1
0x0005: 0x009a
0x0006: 0xe000
0x0007: 0x0003
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:3 dump 0+3
0x0000: 0x2040
0x0001: 0x0002
0x0002: 0x31c3
0x0003: 0x1c13

PHY accepts multiple MMD pages 1, 3, 4, 7, 29, and 30:

root@OpenWrt:~# mdio i2c:sfp2 mmd 17:1 dump 0x2
0x0002: 0x31c3
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:3 dump 0x2
0x0002: 0x31c3
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:4 dump 0x2
0x0002: 0x31c3
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:7 dump 0x2
0x0002: 0x31c3
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:29 dump 0x2
0x0002: 0x31c3
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:30 dump 0x2
0x0002: 0x31c3

Each of them returns a timeout:

root@OpenWrt:~# mdio i2c:sfp2 mmd 17:1
ERROR: Unable to read status (-110)
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:3
ERROR: Unable to read status (-110)
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:4
ERROR: Unable to read status (-110)
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:7
ERROR: Unable to read status (-110)
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:29
ERROR: Unable to read status (-110)
root@OpenWrt:~# mdio i2c:sfp2 mmd 17:30
ERROR: Unable to read status (-110)

EDIT: I added a bunch of debug printf 's to the kernel, and they can slow down reads.

abajk avatar Sep 01 '25 21:09 abajk

Is /sys/class/mdio_bus/i2c:sfp2/statistics/errors_17 non-zero? Is that the source of the timeouts, or is it from the timeout in mdio-netlink?

wkz avatar Sep 02 '25 06:09 wkz

/sys/class/mdio_bus/i2c:sfp2/statistics/errors_17

Version without the debug in the kernel:

root@OpenWrt:~# mdio i2c:sfp2 mmd 17:3
ERROR: Unable to read status (-110)
root@OpenWrt:~# cat /sys/class/mdio_bus/i2c:sfp2/statistics/errors_17 
0

abajk avatar Sep 02 '25 21:09 abajk

Access to registers in SFP modules is very slow. c22 over mdio:

root@OpenWrt:~# mdio mdio-bus 5 bench 0x2
Performed 1000 reads in 28ms

c45 over mdio:

root@OpenWrt:~# mdio mdio-bus 5:3 bench 0x2
Performed 1000 reads in 56ms

c22 over I2C (page 0x56):

root@OpenWrt:~# mdio i2c:sfp2 0x16 bench 0x2
Performed 1000 reads in 544ms

c45 over rollball over i2c:

root@OpenWrt:~# mdio i2c:sfp2 0x11:3 bench 0x2
Benchmark failed after 10.08s
ERROR: Bench operation failed (-110)

abajk avatar Sep 02 '25 21:09 abajk

I think what's going on is that the MMD status command uses mdio_xfer()... https://github.com/wkz/mdio-tools/blob/cd8a90801974afc64eabea664f15095b87dc289c/src/mdio/phy.c#L189 ...which results in a 1s timeout... https://github.com/wkz/mdio-tools/blob/cd8a90801974afc64eabea664f15095b87dc289c/src/mdio/mdio.c#L630-L634 ...whereas bench and dump both use a 10s timeout: https://github.com/wkz/mdio-tools/blob/cd8a90801974afc64eabea664f15095b87dc289c/src/mdio/mdio.c#L486 https://github.com/wkz/mdio-tools/blob/cd8a90801974afc64eabea664f15095b87dc289c/src/mdio/mdio.c#L536

The gist of it: your bus slower than mdio expects any bus to be. I am hesitant to increase the timeout for the status command as well. I guess we could take a custom timeout as a flag 🤔

wkz avatar Sep 03 '25 06:09 wkz