pimd icon indicating copy to clipboard operation
pimd copied to clipboard

Multicast group not appearing on Rendezvous Point router

Open stormshield-rlibaert opened this issue 2 years ago • 6 comments

Hello,

We may have found a bug where Pimd as RP does not get the information of a new route. We use pimd version 3.0-beta1.

We have the following topology : Image Pasted at 2021-12-13 11-25

  • pc_1 is the receiver
  • pc_2 is the source
  • firewall_2 is the RP (Rendezvous Point) router
  • firewall_3 is the DR (Designated Router)

When starting emission of traffic with iperf, we see in pimctl show mrt that a new multicast route is created on the DR, but not on the RP. Similarly when the receiver is started, a new route is shown on both firewall_1 and the RP. However as the RP can not match the requested multicast group, no data is received on pc_1.

We investigated and found this commit may be introducing the bug : https://github.com/troglobit/pimd/commit/edb0aac7326be7020a4da18b0291fb0fdd777960 Issue related : https://github.com/troglobit/pimd/pull/67

We tried the following patch : https://github.com/stormshield-rlibaert/pimd/commit/cd3f7e6332c1b4433ebe3ccb87fad7c85ba6be12 and it seems to be fixing the issue.

We would like to know your advice on whether or not this is a correct way to fix the problem. If so, should we make a merge request ? Also, we are not sure if it is related, but it seems that the "interval" field of spt-threshold is ignored. What do you think ?

stormshield-rlibaert avatar Dec 13 '21 17:12 stormshield-rlibaert

Hi,

you've stumbled into the last remaining blocker issue before 3.0 GA. I still have a few commits in my patch queue that I haven't pushed yet, and even some unstable fixes for what I believe is very close to your issue. I've been meaning to finalize this, and add a test case for it, but have not have the time. Hoping for Christmas break ...

A few questions:

  • Do you get this early after starting up the pimd instances, or does it happen after "a while"?
  • If it's the former, I believe you're seeing what I am, which seems to be lack of route updates on RP election changes
  • What multicast group(s) are you using for testing? The code you changed should only affect the reserved PIM-SSM group range 232.0.0.0/8

troglobit avatar Dec 14 '21 06:12 troglobit

Hi,

Thank you for your fast answer! :smiley:

Indeed we get this as soon as pimd is started (actually after a few second, just the time to switch terminals and launch commands). Our tests were performed in PIM-SM. I realize that this makes the fix kinda strange indeed. I'll go and take a look in the meantime.

stormshield-rlibaert avatar Dec 14 '21 10:12 stormshield-rlibaert

Looks like I was mistaken, the fix was not working on our side. For now, just reverting the suspected commit makes pimd to work as we expect (for reference https://github.com/stormshield-rlibaert/pimd/commit/203b2a8e6df26aa8be52f1e6119ec6f367259a18). And it makes more sense too.

Edit: in our case find_routes returns mrt == NULL, so the call to switch_shortest_path must be done before checking for return, otherwise it doesn't work.

stormshield-rlibaert avatar Dec 14 '21 16:12 stormshield-rlibaert

Yeah that's not right either, the SPT switchover should follow its established config. I've been sniffing around that same code myself.

The problem seem to be exactly what I've seen; when booting up the routers try to elect an RP and any PIM Joins sent before that has stabilized are prone to this problem. So my theory right now, if I remember correctly from a few months back when I was debugging this the last time, is that pimd doesn't resend all PIM Joins (and Leaves) after a new RP election. When that works, then we can have a closer look at reducing the convergence time.

troglobit avatar Dec 14 '21 19:12 troglobit

Hello Joachim,

I further investigated the issue and as you were expecting the proposed revert is not good. I found out the RP indeed created the SG as I wanted but it also started to send registers more than needed, exactly as described in https://github.com/troglobit/pimd/issues/128.

After a few researches, I came with another solution which is basically a fix for the issue I just mentioned. https://github.com/stormshield-rlibaert/pimd/commit/099748bb7476d1e12e19b587f9ca69b31c9b390c

This works pretty well for my test case. However, I am pretty certain that it has some unwanted side effects (PMBR, BorderBit, Null Register, ...). That being said, trying to further fix the issue would imply much intrusive changes. I was thinking about reworking the receive_pim_register function to make it more like what we can find in FRrouting.

stormshield-rlibaert avatar Jan 11 '22 16:01 stormshield-rlibaert

Nice! Yeah I actually have a backlog of things I haven't pushed yet, some of it addressing issues you've mentioned here. Been busy with lots of other projects, however, so haven't got back to this in a while. I'd like to set up a test case for verifying convergence and such bits.

Please send a PR for stormshield-rlibaert@099748b that looks much better than what we have now :)

Meanwhile, I'll reopen #128 so we can close it properly with your PR.

troglobit avatar Jan 11 '22 16:01 troglobit