openvpn icon indicating copy to clipboard operation
openvpn copied to clipboard

[2.6 beta1 w/ dco] SIGUSR1 causing crash. persist-remote-ip but with static remote ip specified(?)

Open Originalimoc opened this issue 3 years ago • 22 comments

Describe the bug with persist-tun:

2022-12-07 16:13:31 UDPv4 link local: (not bound)
2022-12-07 16:13:31 UDPv4 link remote: [AF_INET]**127.0.0.1:12345**
2022-12-07 16:13:31 dco_do_write: netlink reports error (-1): Unspecific failure
2022-12-07 16:13:31 dco_do_write: failed to send netlink message: No route to host (-113)
2022-12-07 16:13:31 write UDPv4 []: Success (fd=4,code=0)
2022-12-07 16:13:33 dco_do_write: netlink reports error (-1): Unspecific failure
2022-12-07 16:13:33 dco_do_write: failed to send netlink message: No route to host (-113)
2022-12-07 16:13:33 write UDPv4 []: Success (fd=4,code=0)

then stuck, not crashing, a SIGHUP can "fix" it

without persist-tun:

2022-12-07 16:13:31 UDPv4 link local: (not bound)
2022-12-07 16:13:31 UDPv4 link remote: [AF_INET]127.0.0.1:12345

crashed

To Reproduce Ubuntu 22. dco on.

client config:

remote 127.0.0.1:12345
persist-local-ip
persist-remote-ip
persist-tun
persist-key

Expected behavior 2.5.8 good. remove persist-remote-ip is good as 2.5.8.

Version information (please complete the following information):

  • OS: Ubuntu 22.04
  • OpenVPN version: 2.6 beta1
  • Repeat for peer if relevant

Originalimoc avatar Dec 07 '22 08:12 Originalimoc

That client config looks incomplete - please show the full client config (without the actual key material, of course).

Is this part of a real setup? Talking to 127.0.0.1 would not work normally for a production setup.

cron2 avatar Dec 07 '22 08:12 cron2

yes, 127.0.0.1:12345 is another tunnel, actual remote route manually bypassed

Originalimoc avatar Dec 07 '22 08:12 Originalimoc

"yes" is not an answer to "please show full client config" - if I'm to fix this issue, I need to understand what you are trying to achieve.

cron2 avatar Dec 07 '22 08:12 cron2

#persist-tun
proto udp
tun-mtu 1428
remote 127.0.0.1 12345
persist-local-ip
explicit-exit-notify 2
connect-retry 1 3
client
nobind
allow-compression no
data-ciphers AES-128-GCM
auth-nocache
script-security 2
verb 3
route-up /etc/openvpn/route-up.sh
route-pre-down /etc/openvpn/route-pre-down.sh
remote-cert-tls server
tls-crypt /etc/openvpn/tlscrypt.key
ca /etc/openvpn/ca.crt
cert /etc/openvpn/c.crt
key /etc/openvpn/k.key
persist-key
ping 0
ping-restart 3600
replay-window 5000 3
mute 8
mlock
fast-io

Originalimoc avatar Dec 07 '22 08:12 Originalimoc

A very generic config

Originalimoc avatar Dec 07 '22 08:12 Originalimoc

I'll moving on without persist-remote-ip because remote is not a domain name anyway, but somewhere there is a bug

Originalimoc avatar Dec 07 '22 08:12 Originalimoc

A very generic config

there is lots of stuff in your config that makes little sense, like fast-io or ping-restart 3600 together with ping 0. Also persist-local-ip only makes sense in a config where you have bind and local $domainname... and auth-nocache makes no sense if you don't actually have any cacheable authentication enabled.

That said, it's still likely that the combination of --persist-remote-ip and DCO is broken, so we'll have a very close look.

cron2 avatar Dec 07 '22 08:12 cron2

I don't want ping static interval(avoid some detection), and ping-restart 3600 can be ping-restart 360000 or whatever because it'll almost never get triggered. auth-nocache is for password auth I used before cert + TLS. No idea what fast-io does indeed, ovpn doesn't use sendmmsg(?) for some reason?

Originalimoc avatar Dec 07 '22 09:12 Originalimoc

Both ping and ping-restart default to 0 = off.

sendmmsg() is not used because nobody coded it yet, but with DCO this point has become moot - data path can now go into the kernel, and userland packet handling can stay as it is (single packet at a time). Initial experiments with sendmmsg() did not resulted in such a great improvement, but at the same time made the code more complex.

Another question, tho - what is listening on 127.0.0.1:12345? Is this a local OpenVPN instance?

cron2 avatar Dec 07 '22 09:12 cron2

Yes after DCO this doesn't matter anymore, dco is even multithreaded(?). 12345 is an obfuscation tunnel. Actual remote is 100ms away, works pretty well for 5+ years. I'm not sure if persist-local-ip will prevent port switching of local endpoint because that'll cause the 12345 tunnel to be renegotiated.

Originalimoc avatar Dec 07 '22 09:12 Originalimoc

the localport can be specified with --lport and it will remain static.

ordex avatar Dec 07 '22 10:12 ordex

the localport can be specified with --lport and it will remain static.

as long as there is nobind, nobody cares about lport...

cron2 avatar Dec 07 '22 10:12 cron2

LOL checked log it did change but it's so fast that I never noticed. Change it after the next maintainance XD.

Originalimoc avatar Dec 07 '22 10:12 Originalimoc

running my client this way, but can't generate any crash:

openvpn --dev tun 
--client 
--ca ../../test-pki/pki/ca.crt 
--cert ../../test-pki/pki/issued/client1.crt 
--key ../../test-pki/pki/private/client1.key 
--verb 3 
--persist-local-ip 
--persist-remote-ip
--persist-tun 
--persist-key 
--fast-io 
--connect-retry 1 3 
--nobind 
--allow-compression no 
--auth-nocache 
--script-security 2 
--ping 0 
--ping-restart 3600 
--replay-window 5000 3 
--mlock 
--tun-mtu 1428 
--remote 127.0.0.1 12345

maybe it is also related to what is listening on the other side?

ordex avatar Dec 07 '22 10:12 ordex

I get

2022-12-07 11:43:34 UDPv4 link local: (not bound)
2022-12-07 11:43:34 UDPv4 link remote: [AF_INET]127.0.0.1:12345
2022-12-07 11:43:34 read UDPv4 [ECONNREFUSED]: Connection refused (fd=3,code=111)

but even if I add some dumb listener on that port, I get nothing

2022-12-07 11:49:38 UDPv4 link local: (not bound)
2022-12-07 11:49:38 UDPv4 link remote: [AF_INET]127.0.0.1:12345

ordex avatar Dec 07 '22 10:12 ordex

Are you: first, you get an established tunnel, then send a SIGUSR1 ?

Originalimoc avatar Dec 07 '22 11:12 Originalimoc

oh ok, I missed that. thanks

ordex avatar Dec 07 '22 11:12 ordex

Huh it turns out it's a bad idea to use static local port. Because server is not getting an explicit-exit-notify when client is getting a SIGUSR1, it will wait until TLS timeout(around 15s in my config) if port is same. Expected behavior or bug?

Originalimoc avatar Dec 07 '22 11:12 Originalimoc

killall openvpn on server get this:

2022-12-07 11:26:09 event_wait : Interrupted system call (fd=-1,code=4)
2022-12-07 11:26:09 SENT CONTROL [Client]: 'RESTART' (status=1)
2022-12-07 11:26:11 Closing DCO interface

but client receives nothing...?

Originalimoc avatar Dec 07 '22 11:12 Originalimoc

please do not mix unrelated problems in one GH issue. Also, you talk about "client getting SIGUSR1" and show a log file from the server getting a SIGINT. Please open a new issue about the --explicit-exit-notify problem, and include client and server logs, with --verb 3 each.

cron2 avatar Dec 07 '22 11:12 cron2

I'd like to do so. Later.

Originalimoc avatar Dec 07 '22 11:12 Originalimoc

Can we close this ticket? I don't think there has been any other report about this issue? And this also was reported on an very old version at this point.

ordex avatar Jun 04 '25 11:06 ordex