core icon indicating copy to clipboard operation
core copied to clipboard

pppoe connection breaks on heavy throughput ("LCP: no reply to 1 echo request")

Open badboy opened this issue 2 years ago • 2 comments

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
  • [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Describe the bug

I recently migrated my opnsense to a ZimaBoard. My upstream connection is done via pppoe.

When filling up the available bandwidth, e.g. with a huge speedy download from a client, the pppoe connection breaks and does not recover until a reboot. I can trigger this easily by running a speedtest (e.g. using https://www.speedtest.net/)

Expected behavior

A stable pppoe connection, no matter the throughput from clients.

Describe alternatives you considered

None so far on this machine. Previously running opnsense on an APU without problems

Relevant log files

/var/log/ppps/ppps_20220525.log:

2022-05-25T10:26:41+00:00 $host ppp 8727 - [meta sequenceId="1"] [opt2_link0] LCP: no reply to 1 echo request(s)
2022-05-25T10:26:51+00:00 $host ppp 8727 - [meta sequenceId="2"] [opt2_link0] LCP: no reply to 2 echo request(s)
2022-05-25T10:27:01+00:00 $host ppp 8727 - [meta sequenceId="3"] [opt2_link0] LCP: no reply to 3 echo request(s)
2022-05-25T10:27:11+00:00 $host ppp 8727 - [meta sequenceId="4"] [opt2_link0] LCP: no reply to 4 echo request(s)
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="5"] [opt2_link0] LCP: no reply to 5 echo request(s)
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="6"] [opt2_link0] LCP: peer not responding to echo requests
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="7"] [opt2_link0] LCP: state change Opened --> Stopping
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="8"] [opt2_link0] Link: Leave bundle "opt2"
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="9"] [opt2] Bundle: Status update: up 0 links, total bandwidth 9600 bps
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="10"] [opt2] IPCP: Close event
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="11"] [opt2] IPCP: state change Opened --> Closing
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="12"] [opt2] IPCP: SendTerminateReq #4
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="13"] [opt2] IPCP: LayerDown
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="14"] [opt2] IPV6CP: Close event
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="15"] [opt2] IPV6CP: state change Opened --> Closing
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="16"] [opt2] IPV6CP: SendTerminateReq #2
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="17"] [opt2] IPV6CP: LayerDown
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="18"] [opt2] IFACE: Down event
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="19"] [opt2] IFACE: Rename interface pppoe0 to pppoe0
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="20"] [opt2] IPCP: Down event
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="21"] [opt2] IPCP: LayerFinish
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="22"] [opt2] IPCP: state change Closing --> Initial
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="23"] [opt2] IPV6CP: Down event
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="24"] [opt2] IPV6CP: LayerFinish
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="25"] [opt2] Bundle: No NCPs left. Closing links...
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="26"] [opt2] IPV6CP: state change Closing --> Initial
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="27"] [opt2_link0] LCP: SendTerminateReq #2
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="28"] [opt2_link0] LCP: LayerDown
2022-05-25T10:27:24+00:00 $host ppp 8727 - [meta sequenceId="29"] [opt2_link0] LCP: SendTerminateReq #3
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="30"] [opt2_link0] LCP: state change Stopping --> Stopped
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="31"] [opt2_link0] LCP: LayerFinish
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="32"] [opt2_link0] PPPoE: connection closed
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="33"] [opt2_link0] Link: DOWN event
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="34"] [opt2_link0] LCP: Down event
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="35"] [opt2_link0] LCP: state change Stopped --> Starting
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="36"] [opt2_link0] LCP: LayerStart
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="37"] [opt2_link0] Link: reconnection attempt 1 in 1 seconds
2022-05-25T10:27:27+00:00 $host ppp 8727 - [meta sequenceId="38"] [opt2_link0] Link: reconnection attempt 1
2022-05-25T10:27:27+00:00 $host ppp 8727 - [meta sequenceId="39"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="40"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="41"] [opt2_link0] Link: DOWN event
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="42"] [opt2_link0] LCP: Down event
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="43"] [opt2_link0] Link: reconnection attempt 2 in 1 seconds
2022-05-25T10:27:37+00:00 $host ppp 8727 - [meta sequenceId="44"] [opt2_link0] Link: reconnection attempt 2
2022-05-25T10:27:37+00:00 $host ppp 8727 - [meta sequenceId="45"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="46"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="47"] [opt2_link0] Link: DOWN event
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="48"] [opt2_link0] LCP: Down event
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="49"] [opt2_link0] Link: reconnection attempt 3 in 4 seconds
2022-05-25T10:27:50+00:00 $host ppp 8727 - [meta sequenceId="50"] [opt2_link0] Link: reconnection attempt 3
2022-05-25T10:27:50+00:00 $host ppp 8727 - [meta sequenceId="51"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="52"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="53"] [opt2_link0] Link: DOWN event
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="54"] [opt2_link0] LCP: Down event
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="55"] [opt2_link0] Link: reconnection attempt 4 in 2 seconds
2022-05-25T10:28:02+00:00 $host ppp 8727 - [meta sequenceId="56"] [opt2_link0] Link: reconnection attempt 4
2022-05-25T10:28:02+00:00 $host ppp 8727 - [meta sequenceId="57"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:28:11+00:00 $host ppp 8727 - [meta sequenceId="58"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:28:11+00:00 $host ppp 8727 - [meta sequenceId="59"] [opt2_link0] Link: DOWN event
2022-05-25T10:28:11+00:00 $host ppp 8727 - [meta sequenceId="60"] [opt2_link0] LCP: Down event

(this continues until I reboot the device)

If applicable, information from log files supporting your claim.

Additional context

This was initially happening very often. We have a very similar setup with near identical configuration at another site, that is not experiencing this problem. One difference we noted early is that VT-d wasn't enabled in the BIOS. This is now enabled. This seemed to have flipped something to turn off MSI:

# sysctl -a | grep MSI
re0: Using 1 MSI-X message
re0: turning off MSI enable bit.

With this the issue occurs only for downloads >300 mbps. Previously it happend far earlier.

Note: a software reboot causes MSI to be enabled again (no "turning off MSI enable bit" in the sysctl output). Only a full power cycle brings it back. Not sure if this is a bug in itself.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 22.1.8_1-amd64 FreeBSD 13.0-STABLE OpenSSL 1.1.1o 3 May 2022 Intel(R) Celeron(R) CPU N3450 @ 1.10GHz (4 cores, 4 threads)

badboy avatar May 29 '22 16:05 badboy

Before I forget:

re0@pci0:2:0:0:	class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x10ec subdevice=0x0123
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
re1@pci0:3:0:0:	class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x10ec subdevice=0x0123
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

And this blogpost reports problems with that NIC as well. For now I have shaped down the bandwidth to stop the issue (but will switch out the board soon to get my full bandwidth back).

badboy avatar Jun 01 '22 13:06 badboy

Just for reference here are the re-related FreeBSD bugs and it seems to be a mess since 2012: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186949 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205

eimann avatar Jul 22 '22 07:07 eimann

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

OPNsense-bot avatar Nov 25 '22 16:11 OPNsense-bot