sonic-swss icon indicating copy to clipboard operation
sonic-swss copied to clipboard

[orchagent] orch get stuck when removing vlan member

Open Yang-Yongzhi opened this issue 1 year ago • 0 comments

Description

There is a intermittent bug making orch stuck when removing vlan member fistly and then immediately create a interface.

Steps to reproduce the issue

  1. enable the swss log level to debug
    swssloglevel -l DEBUG -c orchagent
    
    (make the bug easier to reproduce.)
  2. paste bellow config
    sudo config vlan member del 1000 EthernetXX
    sudo config interface ip add EthernetXX 1.1.1.1/24
    
    sudo config vlan member del 1000 EthernetYY
    sudo config interface ip add EthernetYY 1.1.1.1/24
    

Describe the results you received

  1. Orchagent occpy CPU 100%.
  2. Any new configuration has no effect.
  3. syslog shows bellow log:

    2024 Sep 4 10:47:59.853265 sonic WARNING swss#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'host' (2.0 minutes).

Describe the results you expected

Anything goes well

Output of show version

Output of show techsupport

gdb orchagent in swss container show bellow:

(gdb) bt 
#0  0x00007fa3f5701197 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00005636a394c999 in std::_Rb_tree<unsigned short, std::pair<unsigned short const, swss::VlanMemberEntry>, std::_Select1st<std::pair<unsigned short > const, swss::VlanMemberEntry> >, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, swss::VlanMemberEntry> > >::_M_erase_aux > (__position=..., this=0x5636a518c8b0) at /usr/include/c++/12/bits/stl_tree.h:2493
#2  std::_Rb_tree<unsigned short, std::pair<unsigned short const, swss::VlanMemberEntry>, std::_Select1st<std::pair<unsigned short const, > swss::VlanMemberEntry> >, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, swss::VlanMemberEntry> > >::erase[abi:cxx11]> (std::_Rb_tree_iterator<std::pair<unsigned short const, swss::VlanMemberEntry> >) (__position=..., this=0x5636a518c8b0)
    at /usr/include/c++/12/bits/stl_tree.h:1209
#3  std::map<unsigned short, swss::VlanMemberEntry, std::less<unsigned short>, std::allocator<std::pair<unsigned short const, swss::VlanMemberEntry> > > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<unsigned short const, swss::VlanMemberEntry> >) (__position=..., this=0x5636a518c8b0) at /usr/include/c> ++/12/bits/stl_map.h:1086
#4  PortsOrch::removeVlanMember (this=this@entry=0x5636a4f247e0, vlan=..., port=..., end_point_ip="") at ./orchagent/portsorch.cpp:7742
#5  0x00005636a395df8f in PortsOrch::doVlanMemberTask (this=this@entry=0x5636a4f247e0, consumer=...) at ./orchagent/portsorch.cpp:5947
#6  0x00005636a3971da0 in PortsOrch::doTask (this=0x5636a4f247e0, consumer=...) at ./orchagent/portsorch.cpp:6467
#7  0x00005636a3933ca1 in PortsOrch::doTask (this=0x5636a4f247e0) at ./orchagent/portsorch.cpp:6420
#8  0x00005636a38931da in OrchDaemon::start (this=this@entry=0x5636a4ee3560) at ./orchagent/orchdaemon.cpp:900
#9  0x00005636a37fe18a in main (argc=<optimized out>, argv=<optimized out>) at ./orchagent/main.cpp:800

Yang-Yongzhi avatar Sep 20 '24 07:09 Yang-Yongzhi