sonic-platform-common
sonic-platform-common copied to clipboard
DPinit timeout seen for Innolight transceiver during CMIS init + transceiver OIR causing CMIS init failure
Description
DPinit timeout seen for Innolight transceiver during CMIS init + transceiver OIR causing CMIS init failure
This is caused in Cisco-Innolight because of continuous calls to CDB infra from SFPUpdate Thread while updating FW info. In Cisco-Innolight CDB runs at foreground getting continuous NACKS to CDB calls and this is causing it to miss some CMIS calls also.
Fix is to read FW info from eeprom page data rather than reading from CDB for Cisco_innolight only
Motivation and Context
How Has This Been Tested?
Will update soon
Additional Information (Optional)
@AnoopKamath - Can you please ensure that the changes are tested with the latest 202305 image (which has all the commits in xcvrd.py until https://github.com/sonic-net/sonic-platform-daemons/pull/450)
Also, please capture the output of dmesg | grep optoe as part of testing the changes on device.
@AnoopKamath I cannot take this PR because we still need to read the complete firmware version X.Y.Z. Please fix your module. Nevertheless, PR-450 as pointed by Mihir could fix this issue for Innolight. So, please try that.
@AnoopKamath I cannot take this PR because we still need to read the complete firmware version X.Y.Z. Please fix your module. Nevertheless, PR-450 as pointed by Mihir could fix this issue for Innolight. So, please try that.
@prgeor , @mihirpat1 , I haven’t encountered any OIR issues after using the latest image, which includes Mihir’s PR-450 and doesn’t include my changes. The issue was reproducible on the same setup using older software. The testing was conducted using the sfp-OIR utility. I will also confirm this with a physical OIR test.
@Mihir, Can you please confirm if you observe the same behavior on your setup?
However, I am still encountering ‘optoe: restore page’ error.
dmesg: [ 111.748911] optoe 64-0050: 32896 byte optoe1 EEPROM, read/write [ 111.748930] i2c i2c-64: new_device: Instantiated device optoe1 at 0x50 [ 111.749150] optoe 65-0050: 32896 byte optoe1 EEPROM, read/write [ 111.749168] i2c i2c-65: new_device: Instantiated device optoe1 at 0x50 [ 322.804694] optoe 46-0050: Restore page register to 0 failed:-110! [ 409.644697] optoe 36-0050: Restore page register to 0 failed:-110! [ 496.017233] optoe 34-0050: Restore page register to 0 failed:-110! [ 513.989230] optoe 46-0050: Restore page register to 0 failed:-110! [ 603.080738] optoe 34-0050: Restore page register to 0 failed:-110! [ 621.124710] optoe 46-0050: Restore page register to 0 failed:-110! [ 623.949241] optoe 36-0050: Restore page register to 0 failed:-110! [ 710.269260] optoe 34-0050: Restore page register to 0 failed:-110! [ 718.229217] optoe 47-0050: Restore page register to 0 failed:-110! [ 731.013243] optoe 36-0050: Restore page register to 0 failed:-110!
@AnoopKamath Can you please add a snippet of TRANSCEIVER_FIRMWARE_INFO table for this optic?
@AnoopKamath Can you please add a snippet of TRANSCEIVER_FIRMWARE_INFO table for this optic?
@mihirpat1: please find o/p below:
root@sonic:/home/cisco# sonic-db-cli STATE_DB hgetall "TRANSCEIVER_FIRMWARE_INFO|Ethernet100"
{'active_firmware': '94.10.0', 'inactive_firmware': '0.0.0'}
root@sonic:/home/cisco# show int trans eeprom Ethernet100
Ethernet100: SFP EEPROM detected
Active Firmware: 94.10.0
Active application selected code assigned to host lane 1: 0
Active application selected code assigned to host lane 2: 0
Active application selected code assigned to host lane 3: 0
Active application selected code assigned to host lane 4: 0
Active application selected code assigned to host lane 5: 3
Active application selected code assigned to host lane 6: 3
Active application selected code assigned to host lane 7: 3
Active application selected code assigned to host lane 8: 3
Application Advertisement: 800G L C2M (placeholder) - Host Assign (0x1) - Undefined - Media Assign (0x1)
800G S C2M (placeholder) - Host Assign (0x1) - Undefined - Media Assign (0x1)
400GAUI-4-L C2M (Annex 120G) - Host Assign (0x11) - 400GBASE-DR4 (Cl 124) - Media Assign (0x11)
400GAUI-4-S C2M (Annex 120G) - Host Assign (0x11) - 400GBASE-DR4 (Cl 124) - Media Assign (0x11)
100GAUI-1-L C2M (Annex 120G) - Host Assign (0xff) - 100G-FR/100GBASE-FR1 (Cl 140) - Media Assign (0xff)
100GAUI-1-S C2M (Annex 120G) - Host Assign (0xff) - 100G-FR/100GBASE-FR1 (Cl 140) - Media Assign (0xff)
CMIS Rev: 5.0
Connector: MPO 1x12
Encoding: N/A
Extended Identifier: Power Class 8 (17.0W Max)
Extended RateSelect Compliance: N/A
Host Lane Count: 4
Identifier: QSFP-DD Double Density 8X Pluggable Transceiver
Inactive Firmware: 0.0.0
Length Cable Assembly(m): 0.0
Media Interface Technology: 1310 nm EML
Media Lane Count: 4
Module Hardware Rev: 1.11
Nominal Bit Rate(100Mbs): 0
Specification compliance: sm_media_interface
Supported Max Laser Frequency: N/A
Supported Max TX Power: N/A
Supported Min Laser Frequency: N/A
Supported Min TX Power: N/A
Vendor Date Code(YYYY-MM-DD Lot): 2023-09-25
Vendor Name: CISCO-INNOLIGHT
Vendor OUI: 44-7c-7f
Vendor PN: T-DH8CNT-NCI
Vendor Rev: 2D
Vendor SN: INL2739015D
root@sonic:/home/cisco# sfputil show fwversion Ethernet100
Image A Version: 94.10.0
Image B Version: N/A
Factory Image Version: 3.6.0
Running Image: A
Committed Image: A
Active Firmware: 94.10.0
Inactive Firmware: 0.0.0
@StormLiangMS @yxieca Can you please help in merging this and https://github.com/sonic-net/sonic-platform-common/pull/461 to 202305 and 202311?
@StormLiangMS @yxieca Can you please help with the cherry-pick to 202305 and 202311? MSFT ADO - 27932538
Cherry-pick PR to 202311: https://github.com/sonic-net/sonic-platform-common/pull/463
@mihirpat1 please confirm that this change has been tested on 202305 branch.
@mihirpat1 please confirm that this change has been tested on 202305 branch.
@yxieca : I confirm that this change has been tested on 202305 branch
Cherry-pick PR to 202305: https://github.com/sonic-net/sonic-platform-common/pull/466