core
core copied to clipboard
pppoe connection breaks on heavy throughput ("LCP: no reply to 1 echo request")
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue
Describe the bug
I recently migrated my opnsense to a ZimaBoard. My upstream connection is done via pppoe.
When filling up the available bandwidth, e.g. with a huge speedy download from a client, the pppoe connection breaks and does not recover until a reboot. I can trigger this easily by running a speedtest (e.g. using https://www.speedtest.net/)
Expected behavior
A stable pppoe connection, no matter the throughput from clients.
Describe alternatives you considered
None so far on this machine. Previously running opnsense on an APU without problems
Relevant log files
/var/log/ppps/ppps_20220525.log
:
2022-05-25T10:26:41+00:00 $host ppp 8727 - [meta sequenceId="1"] [opt2_link0] LCP: no reply to 1 echo request(s)
2022-05-25T10:26:51+00:00 $host ppp 8727 - [meta sequenceId="2"] [opt2_link0] LCP: no reply to 2 echo request(s)
2022-05-25T10:27:01+00:00 $host ppp 8727 - [meta sequenceId="3"] [opt2_link0] LCP: no reply to 3 echo request(s)
2022-05-25T10:27:11+00:00 $host ppp 8727 - [meta sequenceId="4"] [opt2_link0] LCP: no reply to 4 echo request(s)
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="5"] [opt2_link0] LCP: no reply to 5 echo request(s)
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="6"] [opt2_link0] LCP: peer not responding to echo requests
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="7"] [opt2_link0] LCP: state change Opened --> Stopping
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="8"] [opt2_link0] Link: Leave bundle "opt2"
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="9"] [opt2] Bundle: Status update: up 0 links, total bandwidth 9600 bps
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="10"] [opt2] IPCP: Close event
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="11"] [opt2] IPCP: state change Opened --> Closing
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="12"] [opt2] IPCP: SendTerminateReq #4
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="13"] [opt2] IPCP: LayerDown
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="14"] [opt2] IPV6CP: Close event
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="15"] [opt2] IPV6CP: state change Opened --> Closing
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="16"] [opt2] IPV6CP: SendTerminateReq #2
2022-05-25T10:27:21+00:00 $host ppp 8727 - [meta sequenceId="17"] [opt2] IPV6CP: LayerDown
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="18"] [opt2] IFACE: Down event
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="19"] [opt2] IFACE: Rename interface pppoe0 to pppoe0
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="20"] [opt2] IPCP: Down event
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="21"] [opt2] IPCP: LayerFinish
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="22"] [opt2] IPCP: state change Closing --> Initial
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="23"] [opt2] IPV6CP: Down event
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="24"] [opt2] IPV6CP: LayerFinish
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="25"] [opt2] Bundle: No NCPs left. Closing links...
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="26"] [opt2] IPV6CP: state change Closing --> Initial
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="27"] [opt2_link0] LCP: SendTerminateReq #2
2022-05-25T10:27:22+00:00 $host ppp 8727 - [meta sequenceId="28"] [opt2_link0] LCP: LayerDown
2022-05-25T10:27:24+00:00 $host ppp 8727 - [meta sequenceId="29"] [opt2_link0] LCP: SendTerminateReq #3
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="30"] [opt2_link0] LCP: state change Stopping --> Stopped
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="31"] [opt2_link0] LCP: LayerFinish
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="32"] [opt2_link0] PPPoE: connection closed
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="33"] [opt2_link0] Link: DOWN event
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="34"] [opt2_link0] LCP: Down event
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="35"] [opt2_link0] LCP: state change Stopped --> Starting
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="36"] [opt2_link0] LCP: LayerStart
2022-05-25T10:27:26+00:00 $host ppp 8727 - [meta sequenceId="37"] [opt2_link0] Link: reconnection attempt 1 in 1 seconds
2022-05-25T10:27:27+00:00 $host ppp 8727 - [meta sequenceId="38"] [opt2_link0] Link: reconnection attempt 1
2022-05-25T10:27:27+00:00 $host ppp 8727 - [meta sequenceId="39"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="40"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="41"] [opt2_link0] Link: DOWN event
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="42"] [opt2_link0] LCP: Down event
2022-05-25T10:27:36+00:00 $host ppp 8727 - [meta sequenceId="43"] [opt2_link0] Link: reconnection attempt 2 in 1 seconds
2022-05-25T10:27:37+00:00 $host ppp 8727 - [meta sequenceId="44"] [opt2_link0] Link: reconnection attempt 2
2022-05-25T10:27:37+00:00 $host ppp 8727 - [meta sequenceId="45"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="46"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="47"] [opt2_link0] Link: DOWN event
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="48"] [opt2_link0] LCP: Down event
2022-05-25T10:27:46+00:00 $host ppp 8727 - [meta sequenceId="49"] [opt2_link0] Link: reconnection attempt 3 in 4 seconds
2022-05-25T10:27:50+00:00 $host ppp 8727 - [meta sequenceId="50"] [opt2_link0] Link: reconnection attempt 3
2022-05-25T10:27:50+00:00 $host ppp 8727 - [meta sequenceId="51"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="52"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="53"] [opt2_link0] Link: DOWN event
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="54"] [opt2_link0] LCP: Down event
2022-05-25T10:28:00+00:00 $host ppp 8727 - [meta sequenceId="55"] [opt2_link0] Link: reconnection attempt 4 in 2 seconds
2022-05-25T10:28:02+00:00 $host ppp 8727 - [meta sequenceId="56"] [opt2_link0] Link: reconnection attempt 4
2022-05-25T10:28:02+00:00 $host ppp 8727 - [meta sequenceId="57"] [opt2_link0] PPPoE: Connecting to ''
2022-05-25T10:28:11+00:00 $host ppp 8727 - [meta sequenceId="58"] [opt2_link0] PPPoE connection timeout after 9 seconds
2022-05-25T10:28:11+00:00 $host ppp 8727 - [meta sequenceId="59"] [opt2_link0] Link: DOWN event
2022-05-25T10:28:11+00:00 $host ppp 8727 - [meta sequenceId="60"] [opt2_link0] LCP: Down event
(this continues until I reboot the device)
If applicable, information from log files supporting your claim.
Additional context
This was initially happening very often.
We have a very similar setup with near identical configuration at another site, that is not experiencing this problem.
One difference we noted early is that VT-d
wasn't enabled in the BIOS. This is now enabled.
This seemed to have flipped something to turn off MSI:
# sysctl -a | grep MSI
re0: Using 1 MSI-X message
re0: turning off MSI enable bit.
With this the issue occurs only for downloads >300 mbps. Previously it happend far earlier.
Note: a software reboot causes MSI to be enabled again (no "turning off MSI enable bit" in the sysctl
output).
Only a full power cycle brings it back. Not sure if this is a bug in itself.
Environment
Software version used and hardware type if relevant, e.g.:
OPNsense 22.1.8_1-amd64 FreeBSD 13.0-STABLE OpenSSL 1.1.1o 3 May 2022 Intel(R) Celeron(R) CPU N3450 @ 1.10GHz (4 cores, 4 threads)
Before I forget:
re0@pci0:2:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x10ec subdevice=0x0123
vendor = 'Realtek Semiconductor Co., Ltd.'
device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
class = network
subclass = ethernet
re1@pci0:3:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x10ec subdevice=0x0123
vendor = 'Realtek Semiconductor Co., Ltd.'
device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
class = network
subclass = ethernet
And this blogpost reports problems with that NIC as well. For now I have shaped down the bandwidth to stop the issue (but will switch out the board soon to get my full bandwidth back).
Just for reference here are the re-related FreeBSD bugs and it seems to be a mess since 2012: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186949 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205
This issue has been automatically timed-out (after 180 days of inactivity).
For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.
If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.