pyroute2
pyroute2 copied to clipboard
Certain routing entries not removed when IP address is deleted from an interface
Run the code to re-produce the error:
from pyroute2 import IPDB
import os
ip = IPDB()
interface = ip.create(ifname="test0", kind="dummy")
try:
interface.add_ip("192.168.71.1/24")
interface.up()
interface.commit()
ip.routes.add(dst="192.168.72.0/24", gateway="192.168.71.254").commit()
interface.del_ip("192.168.71.1/24").commit()
interface.down()
interface.commit()
for route in ip.routes:
if route.oif == interface.index:
print(route["gateway"], route["dst"])
print("---")
os.system("ip route show dev test0")
finally:
interface.remove().commit()
What it does it the following:
- Configures a dummy interface with one IP address
- Sets a routing with this interface as output
- Removes the IP addresses from the interface and pulls it down.
The above are working fine, but when a routing table is queried, pyroute2 reports the routing entry created in step 2, but in fact it does not exists due to step 3. As a proof, the script gets the routing table by the ip command where it is confirmed that there's no routing for the dummy interface at all.
What I would expect is that the routing entries in pyroute2 are up-to-date.
Weird, but true. Fixing.
The root cause found, the kernel doesn't send route updates upon removing dependent routes, so IPDB has to calculate it. Just as upon an interface removal. Ok, not good but doable anyways. Will be fixed tonight.
Thanks for your efforts! I also made a debugging with logging the netlink messages, and I don't really understand why the kernel is not sending the RTM_DELROUTE message for these routes. I was thinking on re-reading the routing table in this case (either by automatically or manually), but if is possible to calculate the changes that's also good of course. Also, one note: the main issue is not with the interface removal but IP address removal.
the main issue is not with the interface removal but IP address removal.
Yep. I just told about the interface removal, since already tried to find a proper fix, and now there is only a workaround (that kinda works, but yet is nothing else but a workaround). The time came to fix it finally, and that's really great that you filed that issue — it helps. Thanks!
I will try to avoid re-reading all the routes, since the routing table may be really huge and this may take too much time.
An attempt to mitigate the issue. For now only IPv4 only, the same issue with IPv6 routes still not fixed. And this fix may cause siginficant slowdown on huge numbers of routes, but I hope to solve that too. At least a bit.
Sorry for being late, but are you sure you linked the correct commit with this issue? I'm not very familiar with pyroute2, but it seems that it belongs to other issue maybe. I'm not sure if calling gc() in 5 seconds interval solves this issue as the timing between the steps (see the original bug report) is uncertain. But as I said, I'm not familiar with your code, just wanted to raise attention.
@csernazs np, thanks for keeping an eye on the issue!
That's not Python gc, but own IPDB's gc. And yep, it is directly related to the issue. Actually, when IPDB handles deletion of dependent routes, it doesn't remove them from the DB, but marks with route['ipdb_scope'] == gc
(they're hidden from the public API, like keys()
etc, but are still directly accessible). Then comes some magic: when IPDB's gc hits such records, it tries to validate them against the system, and then either removes completely, or restores in the DB. And the trick is how to do that and not to affect overall performance at the same time.
NB: I'm cutting 0.4.18 right now, so feature will be included. But the docs will be ready a bit later.
The kernel not sending route updates on interface delete didn't sound right to me, so I did some digging...
For reference:
- When an IPv4 route is deleted manually, rtmsg_fib() is called here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/fib_trie.c?h=v4.12#n1566
- When an IPv4 interface is deleted, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/fib_frontend.c?h=v4.12#n1035 calls https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/fib_semantics.c?h=v4.12#n1333 which marks dependent routes with RTNH_F_DEAD, then it calls https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/fib_trie.c?h=v4.12#n1871 which deletes routes marked with RTNH_F_DEAD without calling rtmsg_fib().
- This is apparently intentional behavior: http://www.spinics.net/lists/netdev/msg254186.html
- However, when an IPv6 interface is deleted, RTM_DELROUTE messages are sent for all dependent routes.