openvpn
openvpn copied to clipboard
The openvpn client suddenly disconnects
Apr 28 04:46:01 ubuntu openvpn[740]: SENT CONTROL [p-qcbj7-pe-cs-vpn-02]: 'PUSH_REQUEST' (status=1)
Apr 28 04:46:02 ubuntu openvpn[740]: PUSH: Received control message: 'PUSH_REPLY,dhcp-option DNS 10.132.252.4,dhcp-option DNS 10.132.253.9,compress lz4-v2,route 10.0.0.0 255.0.0.0,route-gateway 172.31.0.1,topology subnet,ping 10,ping-restart 60,ifconfig 172.31.5.35 255.255.0.0,peer-id 198,cipher AES-256-GCM'
Apr 28 04:46:02 ubuntu openvpn[740]: WARNING: Compression for receiving enabled. Compression has been used in the past to break encryption. Sent packets are not compressed unless "allow-compression yes" is also set.
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: timers and/or timeouts modified
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: compression parms modified
Apr 28 04:46:02 ubuntu openvpn[740]: LZ4v2 compression initializing
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: --ifconfig/up options modified
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: route options modified
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: route-related options modified
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: peer-id set
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: adjusting link_mtu to 1625
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: data channel crypto options modified
Apr 28 04:46:02 ubuntu openvpn[740]: Data Channel: using negotiated cipher 'AES-256-GCM'
Apr 28 04:46:02 ubuntu openvpn[740]: Data Channel MTU parms [ L:1553 D:1450 EF:53 EB:406 ET:0 EL:3 ]
Apr 28 04:46:02 ubuntu openvpn[740]: Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Apr 28 04:46:02 ubuntu openvpn[740]: Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Apr 28 04:46:02 ubuntu openvpn[740]: Preserving previous TUN/TAP instance: tun0
Apr 28 04:46:02 ubuntu openvpn[740]: NOTE: Pulled options changed on restart, will need to close and reopen TUN/TAP device.
Apr 28 04:46:02 ubuntu openvpn[740]: net_route_v4_del: 10.0.0.0/8 via 172.31.0.1 dev [NULL] table 0 metric -1
Apr 28 04:46:02 ubuntu openvpn[740]: sitnl_send: rtnl: generic error (-1): Operation not permitted
Apr 28 04:46:02 ubuntu openvpn[740]: ERROR: Linux route delete command failed
Apr 28 04:46:02 ubuntu openvpn[740]: Closing TUN/TAP interface
Apr 28 04:46:02 ubuntu openvpn[740]: net_addr_v4_del: 172.31.4.119 dev tun0
Apr 28 04:46:02 ubuntu openvpn[740]: sitnl_send: rtnl: generic error (-1): Operation not permitted
Apr 28 04:46:02 ubuntu openvpn[740]: Linux can't del IP from iface tun0
Apr 28 04:46:02 ubuntu systemd-networkd[644]: tun0: Link DOWN
Apr 28 04:46:02 ubuntu systemd-networkd[644]: tun0: Lost carrier
Apr 28 04:46:03 ubuntu openvpn[740]: net_route_v4_best_gw query: dst 0.0.0.0
Apr 28 04:46:03 ubuntu openvpn[740]: net_route_v4_best_gw result: via 192.168.1.1 dev eth0
Apr 28 04:46:03 ubuntu openvpn[740]: ROUTE_GATEWAY 192.168.1.1/255.255.255.0 IFACE=eth0 HWADDR=d8:3a:dd:e6:d1:da
Apr 28 04:46:03 ubuntu openvpn[740]: ERROR: Cannot ioctl TUNSETIFF tun: Operation not permitted (errno=1)
Apr 28 04:46:03 ubuntu openvpn[740]: Exiting due to fatal error
Apr 28 04:46:03 ubuntu systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Apr 28 04:46:03 ubuntu systemd[1]: [email protected]: Failed with result 'exit-code'.
Apr 28 04:46:03 ubuntu systemd[1]: [email protected]: Consumed 10.331s CPU time.
This is my syslog,
power 735 1 0 16:52 ? 00:00:00 /usr/sbin/openvpn
this is Process user
The important line is probably: NOTE: Pulled options changed on restart, will need to close and reopen TUN/TAP device.
So it tried to preserve the TUN but it could not. And then it was missing permissions to close and reopen it. At this point it can do nothing else than give up and some higher layer (like the user or an init system) will need to restart it completely.
https://community.openvpn.net/openvpn/ticket/649#no1
I found this but my version is 2.5.5,In theory, this problem should not arise
This is the same problem area, though trac 649 was ONLY related to peer-id. In your log, we see
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: --ifconfig/up options modified
Apr 28 04:46:02 ubuntu openvpn[740]: OPTIONS IMPORT: route options modified
so the server is sending different ifconfig/route statements in its PUSH_REPLY message (we can't check, as the log only shows the second PUSH_REPLY). If that happens, and the client has dropped privileges, there is nothing we can do but "exit".
You could upgrade to 2.6.10 + DCO (where OpenVPN needs to keep network privileges), do not use user in your config (= keep root privs), ensure the server is not changing options, or just make sure OpenVPN gets restarted.
But for this combination "new IP config requested and not enough privileges to do that" there is nothing we can do in the software, except clearly log it (which we do).
Ran into a similar problem last week, upon switching the servers from openvpn-2.4.12-1.el7.x86_64 (Centos 7.9.2009) to the newer openvpn-2.5.11-1.el9.x86_64 (RHEL9.5 / Rocky Linux 9.5) by means of copying configs and swapping the server IPs. All our OpenVPN 2.5.6 clients (some Debian-based embedded devices) re-tried just once and then crashed, they never came back and, painfully, required in-person visiting and power-cycling. Our v2.3 and v2.4 clients stayed online, however (Note: No NetworkManager used on any clients, including v2.5 that crashed).
Initially, the server SENT CONTROL, PUSH_REPLY as usual, but then suddenly got Connection reset, restarting [0], SIGUSR1[soft,connection-reset] received, client-instance restarting. Then lost all the 2.5.6 debian clients without any clue in the logs on the server side. Upon checking further, here's what happened on client side:
openvpn[1538]: Preserving previous TUN/TAP instance: tun0
openvpn[1538]: NOTE: Pulled options changed on restart, will need to close and reopen TUN/TAP device.
openvpn[1538]: net_route_v4_del: 192.168.10.0/24 via 10.10.10.1 dev [NULL] table 0 metric -1
openvpn[1538]: sitnl_send: rtnl: generic error (-1): Operation not permitted
openvpn[1538]: ERROR: Linux route delete command failed
openvpn[1538]: net_route_v4_del: xx.xx.xx.xx/32 via 10.0.0.1 dev [NULL] table 0 metric -1
openvpn[1538]: sitnl_send: rtnl: generic error (-1): Operation not permitted
openvpn[1538]: ERROR: Linux route delete command failed
openvpn[1538]: net_route_v4_del: 0.0.0.0/0 via 10.10.10.1 dev [NULL] table 0 metric -1
openvpn[1538]: sitnl_send: rtnl: generic error (-1): Operation not permitted
openvpn[1538]: ERROR: Linux route delete command failed
openvpn[1538]: net_route_v4_add: 0.0.0.0/0 via 10.0.0.1 dev [NULL] table 0 metric -1
openvpn[1538]: sitnl_send: rtnl: generic error (-1): Operation not permitted
openvpn[1538]: ERROR: Linux route add command failed
openvpn[1538]: Closing TUN/TAP interface
openvpn[1538]: net_addr_v4_del: 10.10.10.97 dev tun0
openvpn[1538]: sitnl_send: rtnl: generic error (-1): Operation not permitted
openvpn[1538]: Linux can't del IP from iface tun0
openvpn[1538]: net_route_v4_best_gw query: dst 0.0.0.0
openvpn[1538]: net_route_v4_best_gw result: via 10.0.0.1 dev eth0
openvpn[1538]: ERROR: Cannot ioctl TUNSETIFF tun: Operation not permitted (errno=1)
openvpn[1538]: Exiting due to fatal error
Please the comparison between 2.5 and 2.4 logs in the .txt file attached: 2.5-vs-2.4.txt
Looks like some implementation detail for handling the internal OpenVPN system routes has changed between 2.4 and 2.5 versions, triggering crash on some systems. Surprisingly, v2.3 and v2.4 clients survived (deleted the route and added the new one correctly). You can see the internal commands used by clients to add/delete routes differ wildly:
With v2.3 - v2.4 clients, you did:
/sbin/ip route del 192.168.10.0/24
which worked (against v2.5 server); but v2.5 clients now do:
net_route_v4_del: 192.168.10.0/24 via 10.10.10.1 dev [NULL] table 0 metric -1
that resulted in
sitnl_send: rtnl: generic error (-1): Operation not permitted
ERROR: Linux route delete command failed
This only occurred on the clients already connected to the server during the server switchover (via server IP swap).
Key question:
why did the internal OpenVPN system route change from 10.10.10.1 to 10.10.10.2 tunnel IP, between 2.4 and 2.5 server versions?
Before, the old server (v2.4) had:
10.10.10.0/24 dev tun0 proto kernel scope link src 10.10.10.1
After, the new server (2.5) has:
10.10.10.0/24 via 10.10.10.2 dev tun0
That seems odd. tun0 is the same (10.10.10.1) on both servers, and server.conf hasn't changed:
# Configure server mode and supply a VPN subnet
# for OpenVPN to draw client addresses from.
# The server will take 10.8.0.1 for itself,
# the rest will be made available to clients.
# Each client will be able to reach the server
# on 10.8.0.1. Comment this line out if you are
# ethernet bridging. See the man page for more info.
server 10.10.10.0 255.255.255.0
push "route-gateway 10.10.10.1"
route 10.10.10.0 255.255.255.0
The 10.10.10.0/24 tunnel route used to travel via tun0 10.10.10.1 address but in v2.5 it is now going via .10.2 for some reason, although tun0 IP hasn't changed (and this internal system route is what was pushed to the clients that failed). This 10.10.10.2 address has always been some kind of mystery internal IP that i've noticed a long time ago but never fully understood why OpenVPN needs this secondary address (the server's tun0 address is 10.10.10.1 afterall). Perhaps that is the address of a "virtual gateway" inside the openvpn software which routes to the appropriate remote node. For some reason OpenVPN alwas adds this .2 address rules to the routing tables on every VPN server, and once that route changed ever so slightly upon server version upgrade, it broke some clients that didn't permit net_route_v4_del / net_route_v4_add on the tun0 interface for some reason. (?)
Yes the implementation changed using native netlink instead of calling external tools. But it would be good if you can test and see if the problem still exists on 2.6.x.
The 10.10.10.0/24 tunnel route used to travel via tun0 10.10.10.1 address but in v2.5 it is now going via .10.2
It didn't. The old route had src 10.10.10.1 there, not via 10.10.10.2.
The main difference is that, from what I can see, 2.4 ifconfig'ed the whole /24 as on-net on the tun0 (which is why you have no via at all) while 2.5+ ifconfigs a /30 on the tun0, and routes the rest of the /24 to the (virtual) peer-ip, 10.0.0.2.
It does not matter at all, though, as tun0 has no next-hop resolution so all it matters is "will the packets go into tun0" - and they do, in both cases. So this part is really just looking different, but doing the same.
Neither has anything to do with the clients restarting or not (they are not "crashing" but "giving up", which is quite a bit of a difference). I think what you are hitting here is actually a bug in v2.3/v2.4 - errors in route handling, like "permission denied", were not properly caught. So OpenVPN never noticed that /sbin/ip failed to delete/add routes as requested and kept going - fortunate for your use case, but still broken behaviour.
v2.5+ uses netlink and catches errors, so when restarting and trying to re-init the tun (due to PUSH_REPLY option differences) it noticed and gave up.