frr icon indicating copy to clipboard operation
frr copied to clipboard

zebra: fix Out Of Memory issue when displaying large route tables in JSON

Open louis-6wind opened this issue 1 year ago • 6 comments

The command show ip[v6] route [vrf (all|XX)] json may cause an out-of-memory error when attempting to display large route tables.

Currently, JSON objects are created for every prefix and stored in a table JSON object. Once all the prefixes are collected, the table JSON is dumped, and all associated JSON objects are freed. Since commit 0e2fc3d67f, table JSON objects for each VRF are stored in a root JSON object. Similarly, the root JSON is dumped, and all related JSON objects are freed.

When the route tables are large, this process allocates a significant amount of memory for all the JSON objects containing the prefixes. In some cases, this leads to a crash of the protocol daemon due to insufficient memory on the machine.

To address this, instead of storing all the prefix JSON objects in underlying JSON objects before displaying them, modify the approach to allocate a JSON object for the first prefix, display it, and then free it. Repeat this process for each subsequent prefix object.

louis-6wind avatar May 27 '24 09:05 louis-6wind

I'm curious in what kind of setup OOM is happening? 128Mb of memory? My point is if we really need this custom crafted stuff versus reusing what the library offers?

Tested with a full-route and "show ip route json". Got a peak of 8.6g, which is not reasonable.

# top -n 1 -p $(pidof zebra) |grep zebra
 1014 root      20   0 9779220   8.6g  11120 R  93.3  36.5   3:36.34 zebra   

louis-6wind avatar May 27 '24 11:05 louis-6wind

We already do the same thing for "show bgp XX json" https://github.com/FRRouting/frr/blob/f7712516d8ea19629d346eca1d5ad71831ec1f22/bgpd/bgp_route.c#L11950

louis-6wind avatar May 27 '24 11:05 louis-6wind

Ah, GiB is not cool ;-)

ton31337 avatar May 27 '24 11:05 ton31337

@donaldsharp, the patch is improving a little bit the performances.

It was tested with 760K routes.

Before patches

# time vtysh -c 'show ip route json' >/dev/null
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Configuration file[/etc/frr/frr.conf] processing failure: 11

real	0m14.506s
user	0m0.366s
sys	0m0.084s

After patches

# time vtysh -c 'show ip route json' >/dev/null
% Can't open configuration file /etc/frr/vtysh.conf due to 'No such file or directory'.
Configuration file[/etc/frr/frr.conf] processing failure: 11

real	0m10.818s
user	0m0.203s
sys	0m0.181s

louis-6wind avatar May 28 '24 15:05 louis-6wind

tested this code locally and I am seeing significant savings in run time too. Let's get this in.

donaldsharp avatar May 28 '24 21:05 donaldsharp

Once CI finishes I'll get this in

donaldsharp avatar May 29 '24 11:05 donaldsharp

please fix your commits - we can't really have "revert", and "fix", "fix", "fix". please rebase the changes into a series of commits that's meaningful.

"fix" commits were pushed by mistake. Done

louis-6wind avatar Jun 04 '24 12:06 louis-6wind

ci:rerun

louis-6wind avatar Jun 07 '24 13:06 louis-6wind