RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

gnrc_ipv6_nib: Force unspecified next hop addresses

Open xnumad opened this issue 1 year ago • 16 comments

Reserve unspecified address in NCEs for on-link prefixes, to have _PFX_ON_LINK => ipv6_addr_is_unspecified(dst->next_hop) be true

Contribution description

In nib_[onl|offl]_entry_t, a next_hop of NULL (as used by prefix list) shouldn't be interpreted as "don't care" but as an explicit "none, not applicable", because the next_hop value is used for association with a router at https://github.com/RIOT-OS/RIOT/blob/270aa7012fab0ceadeb0b82546ea06dbfe9c0902/sys/net/gnrc/network_layer/ipv6/nib/nib.c#L1469

The code intends to delete all off-link prefixes where next hop is the specified router [1], but (without this PR) it actually also deletes on-link prefixes on the same interface [2] because they falsely use the same next hop. (And are only distinguished as on-link through _PFX_ON_LINK flag, which however is not checked for in this place.)

(Note that on-link prefixes are added again when processing a RA Prefix Information Option, which can be in the same Router Advertisement that caused _handle_rtr_timeout and thereby their accidental deletion. This seems to be common and makes this deletion hard to notice.)

[1] which is the wrong thing to do in the first place, and was therefore removed in 96480008caaed8fa6af80a3afc1d6cb736287a42, thereby making this bug theoretical only [2] such as "the /128 prefix used for managing a temporary address" in #20369 (affected)

Testing procedure

Tests: #20372

xnumad avatar Feb 11 '24 22:02 xnumad

Murdock results

:x: FAILED

3ab7896b1880343f979915c18a7f1698f537639b gnrc_ipv6_nib: Force unspecified next hop addresses

Success Failures Total Runtime
10193 0 10195 14m:06s

Artifacts

riot-ci avatar Feb 12 '24 10:02 riot-ci

Now some tests are failing, is this intended?

benpicco avatar Feb 12 '24 13:02 benpicco

Now some tests are failing, is this intended?

I would guess not. I can only tell so far that we fail because _nib_onl_alloc() fails with next_hop == NULL.

_nib-internal.c:509

         dst->next_hop = _nib_onl_alloc(next_hop, iface);

        if (dst->next_hop == NULL) {
            memset(dst, 0, sizeof(_nib_offl_entry_t));
            return NULL;
        }

fabian18 avatar Feb 12 '24 14:02 fabian18

Maybe no neighbor cache entry should be allocated at all if no next hop address is known.

Neighbor Cache: A set of entries about individual neighbors to which traffic has been sent recently. Entries are keyed on the neighbor's on-link unicast IP address and contain such information as its link-layer address, a flag indicating whether the neighbor is a router or a host (called IsRouter in this document), a pointer to any queued packets waiting for address resolution to complete, etc. A Neighbor Cache entry also contains information used by the Neighbor Unreachability Detection algorithm, including the reachability state, the number of unanswered probes, and the time the next Neighbor Unreachability Detection event is scheduled to take place.

If there is no IPv6 address for a neighbor and neighbors are keyed by their address, why would we want to add a neighbor cache entry at all?

fabian18 avatar Feb 12 '24 14:02 fabian18

Now some tests are failing, is this intended?

No, but turns out they test in a wrong way. :D 🤯 Opened #20377

xnumad avatar Feb 12 '24 15:02 xnumad

I would guess not. I can only tell so far that we fail because _nib_onl_alloc() fails with next_hop == NULL.

It feels like the test was working by coincidence. We create one NCE for the default router from the RA. Then we process the PIO and add a prefix. For that we want to add a NCE with (next_hop = NULL) and iface = 5. So for a prefix, the NCE only stores the interface on which the Prefix has been received. The NCE is found because the NCE exists from the default router but it is only matched by the interface and just because address is NULLreturns true in _addr_equals. There could have been any other Neighbor with interface 5 in the NCE, and I think this would create a wrong reference.

I think it should not be an NCE without an address just to store an interface number for a prefix.

fabian18 avatar Feb 13 '24 10:02 fabian18

      Prefix List  - A list of the prefixes that define a set of
                     addresses that are on-link.  Prefix List entries
                     are created from information received in Router
                     Advertisements.  Each entry has an associated
                     invalidation timer value (extracted from the
                     advertisement) used to expire prefixes when they
                     become invalid.  A special "infinity" timer value
                     specifies that a prefix remains valid forever,
                     unless a new (finite) value is received in a
                     subsequent advertisement.

                     The link-local prefix is considered to be on the
                     prefix list with an infinite invalidation timer
                     regardless of whether routers are advertising a
                     prefix for it.  Received Router Advertisements
                     SHOULD NOT modify the invalidation timer for the
                     link-local prefix.

      Default Router List
                   - A list of routers to which packets may be sent.
                     Router list entries point to entries in the
                     Neighbor Cache; the algorithm for selecting a
                     default router favors routers known to be reachable
                     over those whose reachability is suspect.  Each
                     entry also has an associated invalidation timer
                     value (extracted from Router Advertisements) used
                     to delete entries that are no longer advertised.

Conceptionally the DRL and PL are two different data structures. ~~However, we aggregate them together as offlink entries.~~ (sorry offlink is _PFX, _DC and _FT) For entries in the DRL it is explicitly stated that a DR references an NCE, but no word about it for prefixes.

fabian18 avatar Feb 13 '24 11:02 fabian18

If this causes a test to fail that was testing the wrong thing, you also need to add the fix to the test in this PR, otherwise there is no way to merge this.

benpicco avatar Jul 29 '24 14:07 benpicco

Thanks for the heads up. The erroneous test (https://github.com/RIOT-OS/RIOT/issues/20377) is already fixed in master. Therefore I rebased.

xnumad avatar Jul 29 '24 18:07 xnumad

Squashed your suggested change.

xnumad avatar Jul 29 '24 18:07 xnumad

CI finds that tests/unittests is now failing

benpicco avatar Jul 30 '24 08:07 benpicco