sonic-platform-daemons icon indicating copy to clipboard operation
sonic-platform-daemons copied to clipboard

Enhance/fix media_settings infra for 100G QSFP28 and DPB etc

Open longhuan-cisco opened this issue 6 months ago • 1 comments

Description

Enhance/fix media_settings infra in below aspects:

  1. Support/fix for 100G QSFP28 transceivers:

    • fix its issue of media_key parsed as QSFP28-Unknown-... due to its compliance code defined in Extended Specification Compliance field rather than 10/40G Ethernet Compliance Code
      Example transceiver info for 100G QSFP28:
      
      root@sonic:/home/cisco# show int trans info Ethernet176
      Ethernet176: SFP EEPROM detected
              Application Advertisement: N/A
              ...
              Identifier: QSFP28 or later
              ...
              Specification compliance:
                      10/40G Ethernet Compliance Code: Unknown
                      Extended Specification Compliance: 100GBASE-CR4, 25GBASE-CR CA-25G-L or 50GBASE-CR2 with RS
              ...
      

      Solution: Go check Extended Specification Compliance for QSFP28 100G modules

    • fix its issue of lane_speed_key parsed as None due to QSFP28 having no Application Advertisement (which is CMIS specific field and contains host_electrical_interface_id used by today's logic as speed key )

      Solution: For non-CMIS, directly use port_speed and lane_count to calculate lane speed and use it as key, e.g. 100G / 4 = 25G, then lane speed key is speed:25G

  2. Support/fix for DPB situations:

    • fix the issue that serdes SI values for wrong lanes get picked up from media_settings.json, due to below two mistakes made by today's logic of get_media_val_str():
      Problem example:
      
      root@sonic:/home/cisco# config interface breakout Ethernet176 "4x25G" -fy 
      root@sonic:/home/cisco# show int status | grep -E "Ethernet17[6-9]"
      Ethernet176                   20      25G   9100    N/A   etp44a  routed      up       up  100GBASE-CR4         N/A
      Ethernet177                   21      25G   9100    N/A   etp44b  routed      up       up  100GBASE-CR4         N/A
      Ethernet178                   22      25G   9100    N/A   etp44c  routed      up       up  100GBASE-CR4         N/A
      Ethernet179                   23      25G   9100    N/A   etp44d  routed      up       up  100GBASE-CR4         N/A
      
      >>-- port_mapping.handle_port_change_event() called for Ethernet176, and got inserted to port_mapping.physical_to_logical[44]
      Aug 18 04:46:51.967840 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet176 (num_logical_ports=1, logical_idx=0) in APP_DB:
      Aug 18 04:46:51.967840 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1a,0x1b,0x1c,0x1d)     --> should be (main,0x1a) instead
      ......
      >>-- port_mapping.handle_port_change_event() called for Ethernet177, and got inserted to port_mapping.physical_to_logical[44]      
      Aug 18 04:46:52.027793 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet177 (num_logical_ports=2, logical_idx=1) in APP_DB:
      Aug 18 04:46:52.027818 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1c,0x1d)     --> should be (main,0x1b) instead
      ......
      >>-- port_mapping.handle_port_change_event() called for Ethernet178, and got inserted to port_mapping.physical_to_logical[44]
      Aug 18 04:46:52.085544 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet178 (num_logical_ports=3, logical_idx=2) in APP_DB:
      Aug 18 04:46:52.085544 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1c)
      ......
      >>-- port_mapping.handle_port_change_event() called for Ethernet179, and got inserted to port_mapping.physical_to_logical[44]
      Aug 18 04:46:52.142960 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet179 (num_logical_ports=4, logical_idx=3) in APP_DB:
      Aug 18 04:46:52.142960 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1d)
      ......
      

      Solution: Use lane_count per logical port directly obtained from 'lanes' field in config DB port table

      Problem example:
      
      Upon xcvrd coming up (system bootup/process restart), here Ethernet176 is the real 1st logical port, but last one inserted into port_mapping.logical_port_list, thus wrongly treated as 4th logical port:
      Aug 18 04:32:21.155536 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet177 (num_logical_ports=4, logical_idx=0) in APP_DB:
      Aug 18 04:32:21.155536 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1b)
      Aug 18 04:32:21.830787 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet178 (num_logical_ports=4, logical_idx=1) in APP_DB:
      Aug 18 04:32:21.830855 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1c)
      Aug 18 04:32:21.895498 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet179 (num_logical_ports=4, logical_idx=2) in APP_DB:
      Aug 18 04:32:21.895537 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1d)
      Aug 18 04:32:22.166040 sonic NOTICE pmon#xcvrd[151847]: Publishing ASIC-side SI setting for port Ethernet176 (num_logical_ports=4, logical_idx=3) in APP_DB:
      Aug 18 04:32:22.166121 sonic NOTICE pmon#xcvrd[151847]: 0:(main,0x1a)
      

      Solution: Use subport number directly obtained from config DB port table as index of logical port (nowadays subport will always get automatically populated)

  3. Add regular expression support for lane_speed_key, so that multiple lane speed keys can be grouped together if they share the same lane speed value or same serdes SI values, e.g. speed:200GAUI-8|100GAUI-4|50GAUI-2|25G

  4. Add lane_speed_key support under Default vendor/media key. Also add support for speed:Default, which is useful if default serdes SI setting value is desired when no match is found for available lane speed keys.

    Example:
     {
        'GLOBAL_MEDIA_SETTINGS': {
            '0-31': {
                'Default': {
                    'speed:400GAUI-8': {'idriver': {'lane0': '0x1a', ...}, ...},
                    'speed:200GAUI-8|100GAUI-4|50GAUI-2|25G': {'idriver': {'lane0': '0x1b', ...}, ...},
                    'speed:Default': {'idriver': {'lane0': '0x1c', ...}, ...},
                }
            },
        }
    }
    
  5. Improved code coverage of media_settings_parser.py to 97%

Motivation and Context

This PR is mainly to make sure media_settings infra can work properly for 100G QSFP28 and DPB cases/etc

How Has This Been Tested?

Verified proper settings got notified with different transceivers under both DPB and non-DPB cases Verified compatibility with existing media_settings.json

Additional Information (Optional)

longhuan-cisco avatar Aug 18 '24 09:08 longhuan-cisco