AQtion icon indicating copy to clipboard operation
AQtion copied to clipboard

Fixed PTP software timestamping.

Open peci1 opened this issue 1 month ago • 0 comments

The TX path ignored whether the SKB wanted a SW or HW stamp, and once HW timestamping was enabled for the whole NIC, it always returned HW stamps. With this fix, if the TX packet requests SW stamp, it skips the whole PTP path (unless both HW and SW stamps are requested).

On the RX path, the driver has started in an inconsistent state. By default, no timestamping should be done for RX packets. However, the driver automatically enabled RX timestamping on initialization. This fix correctly initializes the NIC with no timestamping and it needs to be set up via SIOCSHWTSTAMP.

There was also a problem with the RX path on init. The NIC did not put PTP packets on the PTP ring, but it did append timestamps to them. That resulted in wrong RX packets which were 12/14 bytes longer, because the timestamp length is only subtracted from PTP ring packets. This was worked around by adding a "flipping" enable_ptp() call that first sets the opposite value and then the actually wanted one.


I tested this on QNAP QXG-10G2T with two AQC107S and firmware 3.1.121 on an NVidia Jetson AGX Xavier with kernel 4.9.337.

The problem manifested when using with linuxptp and passing the -S flag to ptp4l. As a master, it told that there was no tx timestamp found for the sent SYNC packet (because it received a HW stamp which overrides the requested SW stamp). As a slave, it failed to read incoming SYNC messages with bad message error which was caused by the incorrect length of the RX packet.

I tested this fix both at cold boot (which brings the weird RX config mismatch) and on-the-fly switching between timestamping on and off (sudo hwstamp_ctl -i eth2 -r 0 -t 0 and sudo hwstamp_ctl -i eth2 -r 12 -t 1), link losses and reacquisitions etc. Everything works with this PR.

I also tested this with FW 3.0.33, which does not have PTP support. The driver does, however, provide full TX and RX software timestamping capabilities, and these also work with this PR.

I also tested that HW timestamping still works correctly with this PR (on 3.1.121 FW).

I have a few Thunderbolt cards with AQC107S, too, so I'll try to test with these on a normal notebook with newer kernel. I also have one card with (probably) AQC113C, so I'll also test with that one.

Update:

  • Tested on Sonnet Solo10G (FW 3.1.121) (AQC107S) on x86_64 kernel 5.15 without issues.
  • Tested on OWCTB3ADP10GBE (AQC107S) on x86_64 kernel 5.15 with stock FW 3.1.106 without issues. With 3.1.121, I could not get link.
  • Tested on Sanlink3 N1 (AQC107S) on x86_64 kernel 5.15 with stock FW 3.0.5 without issues (but this FW has only SW timestamping). With FW 3.1.121, no issues were observed.

I've added the NEEDS_PTP_FLIP quirk to all cards. I don't really know if it's needed for all of them. But I assume it can't hurt anything.

Also, I needed to remove check for !aq_ptp->a1_ptp from aq_ptp_clock_init() so that the initial weird RX state is fixed. I don't really know why this check was there.

peci1 avatar Jun 25 '24 23:06 peci1