cilium icon indicating copy to clipboard operation
cilium copied to clipboard

RTPROT_KERNEL Causing IPv4PodCIDR Route Discovery Failure in Quagga and FRR

Open tkgeng opened this issue 2 years ago • 6 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

What happened?

We are using the Native-Routing mode with FRR OSPF protocol to announce pod subnets. This setup worked well until we upgraded from Cilium version 1.13 to 1.14. After the upgrade, I found that pod network announcements failed due to the addition of the proto kernel field in the pod subnet routes. While these routes are visible using ip r on the system but absent from Zebra's table (not displayed with show ip route in vtysh).

While I understand that proto kernel is meant to address this issue #24288 , would it be helpful to introduce a configuration option to selectively enable it?

Of course the direct reason for FRR's failure to detect proto kernel routes is concerning, I have noticed that this discussion has been raised before but remains unanswered https://github.com/FRRouting/frr/discussions/14731.

Cilium Version

1.14+

tkgeng avatar Mar 08 '24 06:03 tkgeng

Yeah, FRR ignores the proto kernel route, and there's no way to disable that behavior. I recently learned that some people rely on the route installed by Cilium (the one that redirects traffic to the cilium_host device) to advertise PodCIDR with routing protocols (I know one more case using BGP).

As far as I know, Cilium never considers it to be used like that, so backward compatibility is never guaranteed. Also, this only works for certain routing types (for example, I think it doesn't work with endpointRoutes.enabled=true. If you wish to use it, you need to make a consensus to keep it as a stable interface first. 

YutaroHayakawa avatar Mar 08 '24 08:03 YutaroHayakawa

hey @tkgeng we are using this patch locally :) because on some envs we have bird which has a similar problem..

+++ b/pkg/datapath/loader/loader.go
@@ -18,7 +18,6 @@ import (
        "github.com/cilium/cilium/pkg/bpf"
        "github.com/cilium/cilium/pkg/byteorder"
        "github.com/cilium/cilium/pkg/datapath/link"
-       "github.com/cilium/cilium/pkg/datapath/linux/linux_defaults"
        "github.com/cilium/cilium/pkg/datapath/linux/route"
        "github.com/cilium/cilium/pkg/datapath/loader/metrics"
        datapath "github.com/cilium/cilium/pkg/datapath/types"
@@ -100,7 +99,6 @@ func upsertEndpointRoute(ep datapath.Endpoint, ip net.IPNet) error {
                Prefix: ip,
                Device: ep.InterfaceName(),
                Scope:  netlink.SCOPE_LINK,
-               Proto:  linux_defaults.RTProto,
        }

I believe we could figure something out so that this is configurable, but keep the current behavior.

oblazek avatar Mar 08 '24 08:03 oblazek

@oblazek Thank you very much. Seeing that you're already using it like this, I don't have to worry about any side effects. I will also give it a try.

tkgeng avatar Mar 08 '24 10:03 tkgeng

another option that doesn't require a patch would be to peer Cilium's BGP with FRR over a loopback address. The latest versions of FRR support allow-reserved-ranges which should allow you to specify 127.0.0.1 as the peer address. Then on the FRR you would just redistribute everything into OSPF and re-advertise it down into the network

networkop avatar Mar 11 '24 09:03 networkop

@networkop Good idea! it will become one of the options we will choose next.

tkgeng avatar Mar 25 '24 03:03 tkgeng

Thanks @networkop, it worked with FRR 10 on my side.

robertvolkmann avatar May 08 '24 10:05 robertvolkmann

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar Jul 08 '24 01:07 github-actions[bot]

This issue has not seen any activity since it was marked stale. Closing.

github-actions[bot] avatar Jul 22 '24 01:07 github-actions[bot]