sonic-platform-daemons icon indicating copy to clipboard operation
sonic-platform-daemons copied to clipboard

CMIS 'ConfigSuccess" failure while changing default ApSel code for 800G DR8/FR8 modules

Open AnoopKamath opened this issue 1 year ago • 1 comments

Description

CMIS 'ConfigSuccess" failure while changing default ApSel code for 800G DR8/FR8 modules Issue seen with vendors: Eoptolink Finisar Source-Photonics

Ex: If module supports 800G, 400G and 100G app code and has default app mode as 800G, an issue has arisen with the 2x400G target mode which is hitting ConfigRejectedPartailDataPath error and failing. (same as 8x100 App mode)

Motivation and Context

Extract from CMIS spec: - 6.2.4.3 Host Rules and Recommendations The host can change the width of a Data Path only while in the DPDeactivated state, i.e. the host must always transition an existing Data Path to DPDeactivated before selecting an Application with a different lane count. Any lane that becomes unused must be marked as such (AppSel = 0000b) or it must be assigned to a new valid Data Path (remaining in DPDeactivated state until eventually used)

Reset AppSel value for all lanes when setting non default app value

How Has This Been Tested?

Tested with different vendors changing different modes:

root@sonic:/home/cisco# show logging xcvrd | grep Ethernet160
Apr  5 17:11:02.534867 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'admin_status': 'up', 'alias': 'etp20a', 'index': '20', 'lanes': '8,9,10,11', 'mtu': '9100', 'speed': '400000', 'subport': '1'}
Apr  5 17:11:02.537697 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000,200000,400000', 'supported_fecs': 'rs', 'host_tx_ready': 'true'}
Apr  5 17:11:02.538174 sonic WARNING pmon#xcvrd[29]: *** Ethernet160CONFIG_DBPORT handle_port_update_event() fvp {'admin_status': 'up', 'alias': 'etp20a', 'index': '20', 'lanes': '8,9,10,11', 'mtu': '9100', 'speed': '400000', 'subport': '1', 'key': 'Ethernet160', 'asic_id': 0, 'op': 'SET'}
Apr  5 17:11:02.550360 sonic WARNING pmon#xcvrd[29]: *** Ethernet160STATE_DBPORT_TABLE handle_port_update_event() fvp {'host_tx_ready': 'true', 'index': '-1', 'key': 'Ethernet160', 'asic_id': 0, 'op': 'SET'}
Apr  5 17:11:03.280329 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0x0, state=INSERTED, appl 0 host_lane_count 4 retries=0
Apr  5 17:11:03.341874 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting appl=3
Apr  5 17:11:03.403239 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting host_lanemask=0xf
Apr  5 17:11:03.524128 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting media_lanemask=0xf
Apr  5 17:11:03.531581 sonic NOTICE pmon#xcvrd[29]: CMIS: Changing from default AppSel 1 to non default AppSel code 3. Reset AppSel code for all lanes
Apr  5 17:11:03.556579 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: force Datapath reinit

Apr  5 17:11:10.936253 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:TRANSCEIVER_INFO fvp {'host_electrical_interface': '800G L C2M (placeholder)', 'active_apsel_hostlane1': '0', 'model': 'EOLD-138HG-02-41', 'hardware_rev': '1.0', 'vendor_rev': '01', 'active_apsel_hostlane5': '0', 'active_apsel_hostlane3': '0', 'host_lane_assignment_option': '1', 'active_apsel_hostlane2': '0', 'ext_identifier': 'Power Class 8 (17.0W Max)', 'media_interface_code': 'Undefined', 'specification_compliance': 'sm_media_interface', 'application_advertisement': "{1: {'host_electrical_interface_id': '800G L C2M (placeholder)', 'module_media_interface_id': 'Undefined', 'media_lane_count': 8, 'host_lane_count': 8, 'host_lane_assignment_options': 1, 'media_lane_assignment_options': 1}, 2: {'host_electrical_interface_id': '800G S C2M (placeholder)', 'module_media_interface_id': 'Undefined', 'media_lane_count': 8, 'host_lane_count': 8, 'host_lane_assignment_options': 1, 'media_lane_assignment_options': 1}, 3: {'host_electrical_interface_id': '400GAUI-4-L C2M (Annex 120G)', 'module_media_interface_id': '400GBASE-DR4 (Cl 124)', 'media_lane_count': 4, 'host_lane_count': 4, 'host_lane_assignment_options': 17, 'media_lane_assignment_options': 17}, 4: {'host_electrical_interface_id': '400GAUI-4-S C2M (Annex 120G)', 'module_media_interface_id': '400GBASE-DR4 (Cl 124)', 'media_lane_count': 4, 'host_lane_count': 4, 'host_lane_assignment_options': 17, 'media_lane_assignment_options': 17}, 5: {'host_electrical_interface_id': '100GAUI-1-L C2M (Annex 120G)', 'module_media_interface_id': '100G-FR/100GBASE-FR1 (Cl 140)', 'media_lane_count': 1, 'host_lane_count': 1, 'host_lane_assignment_options': 255, 'media_lane_assignment_options': 255}, 6: {'host_electrical_interface_id': '100GAUI-1-S C2M (Annex 120G)', 'module_media_interface_id': '100G-FR/100GBASE-FR1 (Cl 140)', 'media_lane_count': 1, 'host_lane_count': 1, 'host_lane_assignment_options': 255, 'media_lane_assignment_options': 255}}", 'vendor_oui': '70-ee-a3', 'active_apsel_hostlane6': '0', 'media_lane_count': '8', 'is_replaceable': 'True', 'cable_type': 'Length Cable Assembly(m)', 'connector': 'MPO 1x12', 'ext_rateselect_compliance': 'N/A', 'active_apsel_hostlane4': '0', 'vendor_date': '2023-02-24   ', 'host_lane_count': '8', 'encoding': 'N/A', 'nominal_bit_rate': '0', 'supported_max_tx_power': 'N/A', 'supported_min_laser_freq': 'N/A', 'active_apsel_hostlane8': '0', 'dom_capability': 'N/A', 'supported_max_laser_freq': 'N/A', 'type': 'QSFP-DD Double Density 8X Pluggable Transceiver', 'manufacturer': 'CISCO-EOPTOLINK ', 'media_interface_technology': '1310 nm EML', 'supported_min_tx_power': 'N/A', 'cmis_rev': '5.0', 'media_lane_assignment_option': '1', 'cable_length': '0.0', 'serial': 'EOP27080006     ', 'active_apsel_hostlane7': '0'}
Apr  5 17:11:10.936760 sonic WARNING pmon#xcvrd[29]: *** Ethernet160STATE_DBTRANSCEIVER_INFO handle_port_update_event() 
Apr  5 17:11:11.212687 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=INSERTED, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:11.274449 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting appl=3
Apr  5 17:11:11.336291 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting host_lanemask=0xf
Apr  5 17:11:11.459386 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting media_lanemask=0xf
Apr  5 17:11:11.473669 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: force Datapath reinit
Apr  5 17:11:33.323121 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_DEINIT, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:34.359393 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: DpDeinit duration 1.0 secs, modulePwrUp duration 10.0 secs
Apr  5 17:11:39.597517 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=AP_CONFIGURED, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:39.635618 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Apply Optics SI found for Vendor: CISCO-EOPTOLINK   PN: EOLD-138HG-02-41 lane speed: 100G
Apr  5 17:11:50.731261 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_INIT, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:50.746872 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: DpInit duration 10.0 secs
Apr  5 17:11:57.587755 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_TXON, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:57.598835 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Turning ON tx power
Apr  5 17:12:03.315972 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_ACTIVATION, appl 3 host_lane_count 4 retries=0
Apr  5 17:12:03.319969 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: READY
Apr  5 17:12:03.390218 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: updated TRANSCEIVER_INFO_TABLE [('active_apsel_hostlane1', '3'), ('active_apsel_hostlane2', '3'), ('active_apsel_hostlane3', '3'), ('active_apsel_hostlane4', '3'), ('host_lane_count', '4'), ('media_lane_count', '4')]
Apr  5 17:12:06.651543 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() :
Apr  5 17:12:06.651646 sonic WARNING pmon#xcvrd[29]: *** Ethernet160STATE_DBTRANSCEIVER_INFO handle_port_update_event() fvp 
Apr  5 17:12:06.655136 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=INSERTED, appl 3 host_lane_count 4 retries=0
Apr  5 17:12:06.714680 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting appl=3
Apr  5 17:12:06.774654 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting host_lanemask=0xf
Apr  5 17:12:06.895745 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting media_lanemask=0xf
Apr  5 17:12:06.917114 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: no CMIS application update required...READY
Apr  5 17:12:14.943653 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000,200000,400000', 'supported_fecs': 'rs', 'host_tx_ready': 'true', 'speed': '400000'}
Apr  5 17:12:14.943750 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000,200000,400000', 'supported_fecs': 'rs', 'host_tx_ready': 'true', 'speed': '400000'}
root@sonic:/home/cisco# 

Additional Information (Optional)

AnoopKamath avatar Apr 05 '24 17:04 AnoopKamath

@AnoopKamath Please test

  1. Xcvrd restart
  2. Config interface shut/no-shut of 2x400G is not impacting the other datapath

prgeor avatar Apr 09 '24 21:04 prgeor

@AnoopKamath Please test

  1. Xcvrd restart
  2. Config interface shut/no-shut of 2x400G is not impacting the other datapath
  1. Tested XCVRD restart and saw modules going to READY state
  2. Tested config shut/no-shut on 4 different modules and it is not impacting other datapath

Logs attached

AnoopKamath avatar May 14 '24 06:05 AnoopKamath

@AnoopKamath can you use this API https://github.com/sonic-net/sonic-platform-common/pull/471

prgeor avatar May 15 '24 19:05 prgeor

@AnoopKamath can you use this API sonic-net/sonic-platform-common#471

@prgeor : UT looks good after I patched https://github.com/sonic-net/sonic-platform-common/pull/471/files. I will update the PR after you merger these changes. Thanks

AnoopKamath avatar May 15 '24 23:05 AnoopKamath

Cherry-pick PR to 202311: https://github.com/sonic-net/sonic-platform-daemons/pull/490

mssonicbld avatar May 18 '24 04:05 mssonicbld

@prgeor @lguohan can we back-port this to 202305? This is not only impact the 800G modules but also 400G modules with breakouts.

zhenggen-xu avatar May 29 '24 23:05 zhenggen-xu