sonic-swss icon indicating copy to clipboard operation
sonic-swss copied to clipboard

Fixes for various OA crashes derived from intf-overlapping and activation of link-local forwarding

Open rodnymolina opened this issue 7 years ago • 14 comments

Fixes for various OA crashes derived from intf-overlapping and activation of link-local forwarding

From a functional standpoint, these are the changes made in this PR:

  • Add sanity-checking logic to OA to prevent overlapping interface prefixes from being pushed to asicdb/hardware.
  • Differentiated processing/injection of "subnet" / "ip2me" / "bcast" interface-routes to cope with scenarios requiring independent handling (e.g. link-local, config-errors)
  • Contributed to reduce kernel<->swss inconsistency by allowing OA to cope with overlapped subnets, and enabling these ones to be resurrected should the primary goes down (as kernel does).
  • Enabled local-scope-based forwarding (link-local), which requires the overlapping prevention logic included in this PR.

From a coding perspective, the following changes were made as part of this patch:

  • Creation of a hashmap to store all interface-routes in the system (both "subnet" and "ip2me" entries are tracked).
  • Move subnet/ip2me/bcast route-creation logic away from m_syncdIntfses struct, as this one doesn't track ip2me state, nor serves our purpose of enabling interface-route tracking for overlap prevention.
  • Adjusted/refactored intfOrch::doTask() routines to simplify the logic dealing with intf-creations.
  • Adjusted intfmgr code to avoid hard-coding the scope of all incoming intf-addresses (currently all are defined as "global").
  • Do the same on neighsyncd to allow link-local neighbors to be passed down to SWSS.

Caveats:

  • This code handles all ip-address additions/changes pushed from the user to SWSS through the regular configDB/intfmgrd pipeline. That's to say that the default "link-local" addresses defined by the kernel are not implicitly injected into SWSS -- only explicitly configured link-local addresses are pushed down by this code. Existing PR/774 can be adjusted to utilize this code to have this task completed.
  • No 'neighbors' nor 'hosts' entries are created in HW for link-local peers. This is to avoid problems in scenarios where the same link-local ipv6 address is learned from more than one L3 interface. After testing latest SAI 3.3, i can see that new functionality has been added, but there's still some work left in SONiC side (sai-redis/meta layer) to have this work completed.

The following UTs have been created:

    test_interface.py::TestRouterInterfaceOverlap::test_IPv4InterfacePartialOverlap PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                        [  8%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv4InterfacePartialOverlapReversed PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                              [ 16%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv4InterfaceFullOverlap PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                                         [ 25%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv4InterfaceFullOverlapReversed PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                                 [ 33%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6InterfacePartialOverlap PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                                      [ 41%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6InterfacePartialOverlapReversed PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                              [ 50%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6InterfaceFullOverlap PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                                         [ 58%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6InterfaceFullOverlapReversed PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                                 [ 66%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6LinkLocalInterfacePartialOverlap PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                             [ 75%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6LinkLocalInterfacePartialOverlapReversed PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                     [ 83%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6LinkLocalInterfaceFullOverlap PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                                [ 91%]
    test_interface.py::TestRouterInterfaceOverlap::test_IPv6LinkLocalInterfaceFullOverlapReversed PASSED

rodnymolina avatar Jan 27 '18 01:01 rodnymolina

Code re-tested with latest images.

rodnymolina avatar Feb 14 '18 01:02 rodnymolina

Code is failing to build due to a dependency with PR/187: https://github.com/Azure/sonic-swss-common/pull/187

rodnymolina avatar Feb 15 '18 04:02 rodnymolina

All checks are passing now since dependencies are met.

rodnymolina avatar Feb 15 '18 18:02 rodnymolina

As we discussed, to address ipv6 link-local issue, we need ask vendor to implement NO_HOST_ENTRY option. I think that is a prerequsite to merge this PR, right?

lguohan avatar Feb 16 '18 17:02 lguohan

The problematic scenario that we discussed is potentially presented when we are dealing with link-local + VLAN subinterfaces. In this code we are preventing that problem by skipping the creation of 'conflicting' neighbor-entries in VLAN scenarios. Traffic directed to this neighbor-entry will still get forwarded when initiated by the local router.

Problem shouldn't affect non-VLAN/regular setups given that link-local addresses are derived from each switch's mac-address, so there's no room for overlapping ipv6 neighbor-entries (unless the user deliberately configure a bogus/duplicated entry).

We will remove this link-local-over-VLAN limitation from our code once that we get vendor support for "NO_HOST_ENTRY" feature.

rodnymolina avatar Feb 17 '18 01:02 rodnymolina

Let's split this PR to two parts, one is to handle overlapping case for same or different interfaces. The other is to let the link local messages going to swss and HW. We should get this fix in also: https://github.com/Azure/sonic-swss/pull/370 so we can completely remote the logic in intfsorch.cpp for dealing with the ifconfig weirdness.

zhenggen-xu avatar Apr 20 '18 05:04 zhenggen-xu

Yes, i'm planning to send a new patch on this one to deactivate link-local processing for the time being, which will be re-enabled later on once that we have the missing sai support. This way we could have one uniform logic to deal with all interface overlaps in the system.

Now, #370 by itself doesn't seem to be fixing all the issues the current "ifconfig" logic is dealing with. Need to do a bit more research in this area.

rodnymolina avatar Apr 20 '18 05:04 rodnymolina

This pr is needed to fix the link local overlap error (?):

Jan 18 01:35:12.569456 ERR #orchagent: :- : object key SAI_OBJECT_TYPE_ROUTE_ENTRY:{"dest":"fe80::42:acff:fe11:2/128","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000022"} already exists
Jan 18 01:35:12.569470  ERR #orchagent: :- Failed to create IP2me route ip:fe80::42:acff:fe11:2, rv:-6
Jan 18 01:35:12.569562  ERR #orchagent: :- Failed due to exception: Failed to create IP2me route.

jipanyang avatar Jan 17 '19 19:01 jipanyang

@jipanyang Finishing dealing with code-conflicts. Will re-test when done and generate a patch-update.

rodnymolina avatar Feb 11 '19 22:02 rodnymolina

@rodnymolina @jipanyang @lguohan Is this PR getting updated?

nikos-github avatar Feb 21 '19 20:02 nikos-github

@nikos-github yes, code is mostly ready (there were many conflicts with the original patch), i'm dealing with a few failing UTs. PR will be updated within the next few days.

rodnymolina avatar Feb 21 '19 20:02 rodnymolina

@lguohan @jipanyang can you guys please look into this one asap? Thanks.

rodnymolina avatar Mar 01 '19 21:03 rodnymolina

retest this please

zhenggen-xu avatar Apr 11 '19 22:04 zhenggen-xu

what is the status here? would be great to have the feature!

MalteJ avatar Mar 25 '20 17:03 MalteJ