sonic-buildimage icon indicating copy to clipboard operation
sonic-buildimage copied to clipboard

ERR swss1#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x10100000000010b

Open saksarav-nokia opened this issue 1 year ago • 4 comments

Description

87 ixre-egl-board33 ERR swss1#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x10100000000010b Sep 6 18:54:23.756348 ixre-egl-board33 ERR swss1#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x10100000000010a Sep 6 18:54:23.772470 ixre-egl-board33 ERR swss1#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x101000000000120 Sep 6 18:54:24.515300 ixre-egl-board33 ERR swss1#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x10100000000011e Sep 6 18:54:24.676670 ixre-egl-board33 ERR swss0#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x10000000000fb Sep 6 18:54:25.428216 ixre-egl-board33 ERR swss0#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x1000000000106 Sep 6 18:54:25.801326 ixre-egl-board33 ERR swss0#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x1000000000108 Sep 6 18:54:26.376037 ixre-egl-board33 ERR swss1#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x101000000000122 Sep 6 18:54:26.584348 ixre-egl-board33 ERR swss0#orchagent: :- handlePortStatusChangeNotification: Failed to get port object for port id 0x1000000000107

Steps to reproduce the issue:

  1. Reboot the Line card in a chassis and check the syslog
  2. The error messages are seen for all fabric ports

Describe the results you received:

The error messages are seen for all fabric ports when the Line card is rebooted and the fabric ports come up

Describe the results you expected:

No Error messages

Output of show version:

202205 latest

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

saksarav-nokia avatar Sep 07 '23 20:09 saksarav-nokia

These errors messages are seen for the fabric ports in the Line card.

saksarav-nokia avatar Sep 07 '23 20:09 saksarav-nokia

@rlhui could you help to look into this chassis related issue?

prgeor avatar Sep 13 '23 15:09 prgeor

@kenneth-arista Can we handle this as part of fabric link monitoring changes. This error is seen on line cards and not on fabric cards. Since there are no fabric ports on lc, sai port up/down event logs these messages as no fabric port available.

Here is analysis from Sakthi. Please check. For the front panel ports, portsysncd reads the ports from CONFIG_DB and updates them in APPL_DB. Then orchagent portsorch.cpp doTask process the APP_DB notifications and adds to m_portList . When the port change notification is received from SAI, orchagent calls getPort which does lookup in m_portList and so no error message. For fabric ports, the ports are not added to APP_DB even if fabric_port-config.ini is present and hence it is not added to m_portList in orchagent. So when the port change notification is received, the getPort fails and the error message is printed. In master, i see the fabric ports get added to FABRIC_PORT_TABLE in APP_DB , however there is no code in port syncd or orchagent which process the events from FABRIC_PORT_TABLE and add to the port_list map.

vmittal-msft avatar Apr 09 '24 20:04 vmittal-msft

@kenneth-arista Can we handle this as part of fabric link monitoring changes. This error is seen on line cards and not on fabric cards. Since there are no fabric ports on lc, sai port up/down event logs these messages as no fabric port available.

Here is analysis from Sakthi. Please check. For the front panel ports, portsysncd reads the ports from CONFIG_DB and updates them in APPL_DB. Then orchagent portsorch.cpp doTask process the APP_DB notifications and adds to m_portList . When the port change notification is received from SAI, orchagent calls getPort which does lookup in m_portList and so no error message. For fabric ports, the ports are not added to APP_DB even if fabric_port-config.ini is present and hence it is not added to m_portList in orchagent. So when the port change notification is received, the getPort fails and the error message is printed. In master, i see the fabric ports get added to FABRIC_PORT_TABLE in APP_DB , however there is no code in port syncd or orchagent which process the events from FABRIC_PORT_TABLE and add to the port_list map.

I tried the experiment of rebooting a linecard on Arista chassis using 202205 image, and don't see the error messages mentioned above. I checked on both sup and linecard , none of them has these error messages. I printed out all the status changing ports here and no fabric ports on our system. So I could not verify the oids in the messages , e.g. "handlePortStatusChangeNotification: Failed to get port object for port id 0x101000000000122" , mentioned in the issue are from fabric ports or not .

Then I tried another quick test of messing around a fabric link via diag commands to make it down or disable, in this case, I can see the handlePortStatusChangeNotification message of that specific link. So then , it means basically with the reboot linecard case I do not see the fabric link down/ up notifications.

From reading code, the P4Orch implements handlePortStatusChangeNotification, and it only checks if the port is a front panel port/ special ports or not in PorsOrch. It does not register FabricPortOrch or check it.

I don't think it is related to the tables app_db mentioned in the comments . I could add code to mute the error messages here if those are from fabric ports, just that's still not fixing the issue if those oid in the issue are not from fabric ports at all. Also I think maybe saksarav-nokia can help check why the link flap on nokia systems

jfeng-arista avatar Apr 14 '24 21:04 jfeng-arista

@vmittal-msft, @saksarav-nokia As @jfeng-arista indicated, the error log comes from P4Orch::handlePortStatusChangeNotification (https://github.com/sonic-net/sonic-swss/blob/9e183a650cf38e0d5af46adc1342dcde85ecb84d/orchagent/p4orch/p4orch.cpp#L183) when there is a port_state_change. In other words, when there is a port flap. P4Orch is not currently relevant to T2 chassis. As such, these logs are benign. However, as you are seeing these errors on all fabric ports during linecard reboot, consider investigating why these flaps are happening on your SKU. We haven't seen these logs on rebooting an Arista linecard.

Some actions that @saksarav-nokia and team should follow up on:

  1. Investigate fabric port flaps seen on ixre-egl-board33
  2. If necessary, add logic to P4Orch to skip fabric ports

kenneth-arista avatar Apr 22 '24 19:04 kenneth-arista

@arlakshm please help reassigning this issue to @saksarav-nokia

kenneth-arista avatar Apr 22 '24 19:04 kenneth-arista

@saksarav-nokia to update the issue with findings on Nokia devices

arlakshm avatar May 08 '24 17:05 arlakshm

I don't see this issue on latest 202205 image. Closing for now.

vmittal-msft avatar Aug 29 '24 00:08 vmittal-msft