frr
frr copied to clipboard
isisd: Fix memory leaks when the transition of neighbor state from non-UP to DOWN
When receiving a hello packet, if the neighbor state transitions directly from a non-ISIS_ADJ_UP state (such as ISIS_ADJ_INITIALIZING) to ISIS_ADJ_DOWN state, the neighbor entry cannot be deleted. If the neighbor is removed or the neighbor's System ID changes, it may result in memory leakage in the neighbor entry.
Test Scenario: LAN link between Router A and Router B is established. Router A does not configure neighbor authentication, while Router B is configured with neighbor authentication. When the neighbor entry on Router B ages out, the neighbor state on Router A transitions to INIT. If Router B is then removed, the neighbor state on Router A transitions to DOWN and persists.
Signed-off-by: zhou-run [email protected]
Does the failed test make sense, or in other words, should the neighbor not be deleted when transitioning from UP to DOWN state? https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO5U18ARM8-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO8U18AMD64-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-ASAN7-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO1DEB10AMD64-2673
Does the failed test make sense, or in other words, should the neighbor not be deleted when transitioning from UP to DOWN state? https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO5U18ARM8-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO8U18AMD64-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-ASAN7-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO1DEB10AMD64-2673
I re-run the failed test to be sure it is not a false positive.
Now, regarding your question, the neighbor is created when you add it in the configuration and when IS-IS Hello messages are received on the given interface and are deleted when you remove it in the configuration. I think, we must keep the context when the interface goes from UP to DOWN to speed up the process when the interface goes from DOWN to UP. In particular, when an interface (frequent when optical connector SFP reach end of life) flap (going Down, then Up, then Down ...), if you remove the context, the code will delete dynamic memory allocation and then recreate it and so on. This could decrease performance and at the end the risk of a memory corruption. So, I think it is preferable to keep the context if the interface goes DOWN and only remove the context when the neighbor is removed from the configuration
Does the failed test make sense, or in other words, should the neighbor not be deleted when transitioning from UP to DOWN state? https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO5U18ARM8-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO8U18AMD64-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-ASAN7-2673 https://ci1.netdef.org/browse/FRR-PULLREQ3-TOPO1DEB10AMD64-2673
I re-run the failed test to be sure it is not a false positive.
Now, regarding your question, the neighbor is created when you add it in the configuration and when IS-IS Hello messages are received on the given interface and are deleted when you remove it in the configuration. I think, we must keep the context when the interface goes from UP to DOWN to speed up the process when the interface goes from DOWN to UP. In particular, when an interface (frequent when optical connector SFP reach end of life) flap (going Down, then Up, then Down ...), if you remove the context, the code will delete dynamic memory allocation and then recreate it and so on. This could decrease performance and at the end the risk of a memory corruption. So, I think it is preferable to keep the context if the interface goes DOWN and only remove the context when the neighbor is removed from the configuration
Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'?
Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'? No. I'm referring to CLI
no isis router WORDat the interface level which delete the IS-IS configuration on a given interface.
this says there are linter problems, but it doesn't show that the are ... let's try rerunning ci to see if we can find them
ci:rerun
Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'? No. I'm referring to CLI
no isis router WORDat the interface level which delete the IS-IS configuration on a given interface.
Does this mean that when the IS-IS configuration is not removed on the specified interface, we do not need to consider potential memory leaks caused by frequent changes in the System ID due to neighbor updates? Also, if neighbors are not deleted, should the neighbor state transition from DOWN to UP again also ensure that it goes through the INITIALIZING state?
@Mergifyio backport stable/10.0
backport stable/10.0
✅ Backports have been created
- #16327 isisd: Fix memory leaks when the transition of neighbor state from non-UP to DOWN (backport #15716) has been created for branch
stable/10.0
@frrbot rereview
Squash to a single commit all 3 commits.
Should I create a new Pull Request to merge the commits? Because I can't find this Pull Request in my repository, it is in the FRRouting/frr repository, but I can't find the "Merge pull request" button.
Squash to a single commit all 3 commits.
Done.
Does 'remove the context when the neighbor is removed from the configuration' refer to the command 'clear isis neighbor [WORD]'? No. I'm referring to CLI
no isis router WORDat the interface level which delete the IS-IS configuration on a given interface.Does this mean that when the IS-IS configuration is not removed on the specified interface, we do not need to consider potential memory leaks caused by frequent changes in the System ID due to neighbor updates? Also, if neighbors are not deleted, should the neighbor state transition from DOWN to UP again also ensure that it goes through the INITIALIZING state?
are we still waiting on an answer to this before pushing?
should this be labeled a bugfix? it seems like it should ...
No. I'm referring to CLI
no isis router WORDat the interface level which delete the IS-IS configuration on a given interface.Does this mean that when the IS-IS configuration is not removed on the specified interface, we do not need to consider potential memory leaks caused by frequent changes in the System ID due to neighbor updates? Also, if neighbors are not deleted, should the neighbor state transition from DOWN to UP again also ensure that it goes through the INITIALIZING state?
I tried the scenario of a P2P link. If the link goes DOWN without deleting the neighbor table entry, the new neighbor cannot be established because the old neighbor is still occupying the slot, unless ISIS is deleted and reconfigured. @odd22 @riw777
I tried the scenario of a P2P link. If the link goes DOWN without deleting the neighbor table entry, the new neighbor cannot be established because the old neighbor is still occupying the slot, unless ISIS is deleted and reconfigured. @odd22 @riw777
OK. So, in this case, we must delete the neighbor context to be sure that the adjacency could goes UP when the link goes UP again. Just to be sure, when you perform the test, you shutdown the interface and then no shut the interface ? You don't try to change the neighbor or the configuration of the interface ?
I just perform a test on one of my router: set the interface down with ìp link set down dev ethX` check that we lost the IS-IS adjacency and thus the neighbor. Then, set the link up again and we get back the IS-IS neighbor. I perform the test on a standard FRR. Not compiled with your patch. So, the original code handle correctly link failure when interface is P2P. So, in case of the state is not UP, I think you are right and neighbor should be removed to handle the memory leak and also the case where the adjacency is UP and goes down, then UP again even if it consume some CPU/Memory allocation in case of intermittent link failure.
I tried the scenario of a P2P link. If the link goes DOWN without deleting the neighbor table entry, the new neighbor cannot be established because the old neighbor is still occupying the slot, unless ISIS is deleted and reconfigured. @odd22 @riw777
OK. So, in this case, we must delete the neighbor context to be sure that the adjacency could goes UP when the link goes UP again. Just to be sure, when you perform the test, you shutdown the interface and then no shut the interface ? You don't try to change the neighbor or the configuration of the interface ?
This is an exceptional scenario. I did not shutdown the interface and then no shut the interface, but due to failed adjacency authentication, the neighbor became unidirectional. After the IS table entry for the neighbor aged out, it relearned the INIT entry for the neighbor. At this point, when I modified the neighbor's IS net, it resulted in the new neighbor unable to establish adjacency.
what requested change are we waiting on here?
what requested change are we waiting on here?
I don't know either. @ton31337