frr icon indicating copy to clipboard operation
frr copied to clipboard

bgpd: Revalidate locally originated routes also when RPKI state changes

Open ton31337 opened this issue 1 year ago • 19 comments

If we have something like:

router bgp 65001
 network 10.10.10.0/24 route-map rpki
!
route-map rpki permit 10
 match rpki valid
 set local-preference 150
route-map rpki permit 20
 match rpki notfound
 set local-preference 50
!

Then 10.10.10.0/24 is never revalidated when RPKI state changes. E.g. if it was valid, and moves to notfound => local-preference remains 150, but should be 50.

With this patch we force BGP network (static) routes to be revalidated when revalidation event is triggered from RPKI module.

Fixes https://github.com/FRRouting/frr/issues/16474

ton31337 avatar Jul 26 '24 13:07 ton31337

I know we have other places in the code where we have timers to regenerate data in BGP but I am personally not happy about that approach as that it adds unnecessary load to already loaded systems when there are a bunch of prefixes in the system. Is there a way we can have routes revalidated automatically? what is going on that we have to set a timer?

donaldsharp avatar Jul 29 '24 12:07 donaldsharp

what is going on that we have to set a timer?

Routes are never revalidated actually (including static routes (network ...), redistributed routes, ...). When a callback is received from librtr, we validate per-prefix and per-neighbor (using soft inbound) only.

ton31337 avatar Jul 29 '24 12:07 ton31337

waiting on https://github.com/FRRouting/frr/issues/16474 to be tested to make certain it resolves this issue

riw777 avatar Jul 30 '24 14:07 riw777

https://github.com/FRRouting/frr/issues/16474 is fixed using this approach. @donaldsharp do you have any other solution or we can go with this?

ton31337 avatar Jul 30 '24 17:07 ton31337

this might need to be redesigned so it's not timer based ...

It's not really possible easily. RPKI cache server sends only changed prefixes. E.g. using StayRTR:

{
  "roas": [
    {
      "prefix": "100.100.100.100/32",
      "maxLength": 32,
      "asn": "AS65001"
    }
  ]
}

If I change to:

{
  "roas": [
    {
      "prefix": "100.100.100.101/32",
      "maxLength": 32,
      "asn": "AS65001"
    }
  ]
}

Then 100.100.100.101/32 will be sent as a prefix to be validated. If this route does not exist in the RIB, then no changes will be triggered and again we have a stale state.

With the timer, it's deterministic, we always know we have the right state eventually. It can be controlled with high timer values and does not need to be every 30 seconds.

UPDATE: Opened an issue, but I doubt it's even possible because it's more like up to the RPKI cache server to send those removed prefixes.

ton31337 avatar Jul 31 '24 20:07 ton31337

@donaldsharp are we still looking into another way to do this, or should we go with this commit for the moment?

riw777 avatar Aug 27 '24 14:08 riw777

I would really really prefer that we fix this problem the correct way the first time. I don't think we are in too much of a hurry since this is a day one problem with the rpki code.

donaldsharp avatar Aug 28 '24 11:08 donaldsharp

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Sep 18 '24 11:09 github-actions[bot]

Where are we on this? Still waiting on a better way to resolve it?

riw777 avatar Nov 05 '24 14:11 riw777

TBH, there is no other proper way of handling this other than scanning and revalidating. For the same reason, RPKI cache server daemons just do not send "removed" prefixes, and rtrlib is just a middleware here.

ton31337 avatar Nov 11 '24 07:11 ton31337

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Dec 19 '24 22:12 github-actions[bot]

@Mergifyio rebase

riw777 avatar Jan 07 '25 13:01 riw777

rebase

❌ Pull request can't be updated with latest base branch changes

Mergify needs the author permission to update the base branch of the pull request. @opensourcerouting needs to authorize modification on its head branch.

mergify[bot] avatar Jan 07 '25 13:01 mergify[bot]

We need to figure out what to do with this ... do we merge it as is, or do we wait on something from the RPKI folks to resolve it?

riw777 avatar Jan 28 '25 15:01 riw777

Russ let's bring it up during today's meeting

donaldsharp avatar Jan 28 '25 15:01 donaldsharp

Waiting on RPKI folks to answer whether this will be fixed on their end or not ...

riw777 avatar Feb 04 '25 11:02 riw777

I assume we're still waiting on the RPKI folks here?

riw777 avatar Mar 18 '25 12:03 riw777

Yeah...

ton31337 avatar Mar 18 '25 12:03 ton31337

Should we try to have a side meeting with the RPKI folks to get this one pushed?

riw777 avatar Jun 03 '25 12:06 riw777

Anything happening here?

riw777 avatar Jul 15 '25 12:07 riw777

Can someone from FRR please reopen and merge this? It's still a problem, at least in nightly VyOS builds.

dhess avatar Oct 11 '25 13:10 dhess

It's possible to reopen, but we don't have an agreement here. @donaldsharp, what do you think if we do this with a separate configuration knob?

ton31337 avatar Oct 11 '25 14:10 ton31337