frr icon indicating copy to clipboard operation
frr copied to clipboard

bfdd: large number of raw sockets for vrfs results in severe degradation of bandwidth for all network traffic

Open crosser opened this issue 1 year ago • 1 comments

Description

On Linux, when bfdd is running, in default configuration, and multiple VRFs are present in the system, TCP stack bandwidth, measured between points on the same host, drops significantly. (echo-mode was not enabled in our tests.)

Version

10.1 and a several previous versions

How to reproduce

Create two namespaces that are connected by a veth pair. In one namespace, run iperf3 in server mode, in another - in client mode. Note measured bandwidth.

In the default namespace, create a number of VRFs (they do not need to represent any BGP-EVPNs).

Rerun iperf measurement. Observe significant decrease of bandwidth

(50 VRFs produce noticeable effect, 250 result in 3-fold drop of bandwidth in our test.)

Expected behavior

Running bfdd is not expected to have significant impact on system performance.

Actual behavior

Running bfdd on a system that has multiple VRFs results in significant degradation of network bandwidth.

Additional context

There is a slack discussion starting with this message:

https://frrouting.slack.com/archives/C9F2ATASC/p1723557762197969

Text of the discussion as of the moment of opening this ticket is attached: frr-bfdd-slack.txt

Checklist

  • [X] I have searched the open issues for this bug.
  • [X] I have not included sensitive information in this report.

crosser avatar Aug 15 '24 13:08 crosser

I made a low cost effort to tell autoconf to not define BFD_LINUX, in the hope that UDP code will be used instead. But it did not seem to work.

I.e. bfdd apparently is not replying, and there are these messages in the log:

bfdd[4827]: [YA0Q5-C0BPV] control-packet: invalid TTL: 0 expected 255 [mhop:no peer:10.42.54.1]

crosser avatar Aug 16 '24 12:08 crosser