frr
frr copied to clipboard
pim6d: CPU usage increase too much with multi payload
Description
In our environment, we use the IPv6 multicast:
- The multicast sender in VM1, we enabled
ipv6 pim - The Switch also enabled
IGMPv3&MLDv2&pim ipv6&ospfv3 - The multicast receiver join (S,G):
(2001:72:101::94:16,ff35:94::1)
And the problems are:
- When we only start several multicast and with little palyload, it works well.
- But after we start 1000 multicast with 160 bytes with 20ms frequency, the CPU usage increase more, about 15%.
- Same multicast number and playload in IPv4 is no problem, the CPU usage less than 1%.
- Our product playload is bigger than this, and the CPU usage also very huge.
Version
FRRouting 10.0.1 (mcptt-cp) on Linux(5.14.0-427.57.1.el9_4.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-static' '--disable-werror' '--enable-multipath=256' '--enable-vtysh' '--enable-ospfclient' '--enable-ospfapi' '--enable-rtadv' '--enable-ldpd' '--enable-pimd' '--enable-pim6d' '--enable-pbrd' '--enable-nhrpd' '--enable-eigrpd' '--enable-babeld' '--enable-vrrpd' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-fpm' '--enable-watchfrr' '--disable-bgp-vnc' '--enable-isisd' '--enable-rpki' '--enable-bfdd' '--enable-pathd' '--enable-snmp' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'
How to reproduce
- Setup the FRR with PIMv6.
- Start to join the IPv6 (S,G)
- Top command check the CPU usage.
Expected behavior
The pim6d CPU usage should be similar with pimd process CPU usage.
Actual behavior
The pim6d CPU usage too big.
Additional context
I registe this bug https://github.com/FRRouting/frr/issues/16071 before, but after we upgrade the OS kernel to "5.14.0-427.57.1.el9_4.x86_64", the error log not reproduce. But the CPU usage still not normal.
Checklist
- [x] I have searched the open issues for this bug.
- [x] I have not included sensitive information in this report.
Can we get a flamegraph of what pim6d is doing at the time it's usage is high? https://github.com/FRRouting/frr/wiki/Perf-Recording
I installed the perf and frr-debuginfo, run below command to get the files:
perf record -g --call-graph=dwarf -p 21969 -- sleep 10
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > pim6d_debug_flamegraph.svg
I just installed the kernel-debuginfo-5.14.0-427.57.1.el9_4.x86_64.rpm package and test it again
perf record -g --call-graph=dwarf -p 777 -- sleep 30
perf script > out.perf
./stackcollapse-perf.pl out.perf | ./flamegraph.pl > pim6d_debug_flamegraph.svg
mv perf.data perf_ipv6_with_debug.data
tar -czvf /tmp/perf_data_2025_03_25_02_ipv6.tar.gz perf_ipv6_with_debug.data out.perf pim6d_debug_flamegraph.svg
hi @donaldsharp Did you check the perf data, if you have any finding, pls let me know. Thanks~~
Hi @donaldsharp is there any other data that you need, that would help with the issue?
@Pengwei-Chen-1 Hi, did you by any chance try to produce the flame graph for IPv4 as well? I know that the code is different but the network and multicast handling stuff could be similar. The difference could help pinpoint the issue.
This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.
This issue will be automatically closed in the specified period unless there is further activity.
Hi @donaldsharp do you have any tips what debugging steps could be done next here?
This issue will no longer be automatically closed.