tinc icon indicating copy to clipboard operation
tinc copied to clipboard

Support large MTU

Open asbai opened this issue 3 years ago • 2 comments

OpenVPN supports a large MTU size up to 63000 bytes. According to actual measurements, in many cases, a large MTU can make IO operations (whether for tuntap devices or back-end UDP/TCP connections) more efficient, thereby effectively improving the throughput of VPN gateways.

Although tinc supports the PMTU parameter, but as long as we modify it to a value greater than 1500 in the actual measurement, it will not work properly, for example:

ClampMSS = no
PMTU = 63000
PMTUDiscovery = no

And then:

ifconfig vpn0 mtu 63000

This makes tinc need 3 times the CPU overhead to achieve the same throughput as OpenVPN (In order to exclude the interference of other factors, we disabled encryption, compression and hash/hmac verification for tinc and OpenVPN during the test, and only did the pure plaintext communication).

Therefore, we hope that tinc can support large size MTU options.

Thanks :-)

asbai avatar Oct 10 '21 23:10 asbai

Wouldn't MTU larger than the underlying transmission media causes fragmentation?

This is the list of MTUs for common media. https://en.wikipedia.org/wiki/Maximum_transmission_unit#MTUs_for_common_media

fangfufu avatar Nov 06 '21 23:11 fangfufu

Imagine what you would do when you use tcp to send a large data block (such as a video). Will each call to send api be performed in accordance with the size of an MTU (MSS)?

Compared with splitting a large piece of data into many small pieces (MSS size) and then calling send api thousands of times to send them, it is obviously more efficient to directly submit a large piece of memory to the send interface of the system at this time. Because the former will produce a lot of kernel / user space switching. Moreover, a user-mode app may not be able to accurately evaluate which physical device a message will be sent from (each device may have different MTU configurations).

Of course, a better method is to directly use asynchronous IO (such as io_uring on linux or overlapped io+iocp on msw) to complete the data transmission directly through the network card DMA with zero copy, but I will not expand it here.

As for the MTU split when the data packet is actually sent from the physical device, the protocol stack in the kernel and the network card hardware (tcp offloading) can be solved in the best and most efficient way, without the need for the user space APP to worry about (unless we are using it) Some kernel bypss technologies, such as dpdk).

Therefore, if tinc, as an ordinary app working in the user space, can support sending and receiving more data during each IO operation, it will significantly improve performance in many situations.

asbai avatar Nov 07 '21 00:11 asbai