onetun icon indicating copy to clipboard operation
onetun copied to clipboard

Improve bandwidth

Open aramperes opened this issue 2 years ago • 4 comments

I tested some sftp tunneling on some big files and found that each connection is capped at around 2MB/s. Naturally this depends on the latency and bandwidth with the WireGuard router, but with the official kernel implementation I get around 20MB/s for the same endpoint.

$ sftp [email protected]
# 20MB/s

$ sftp -P 2222 [email protected]
# onetun: 2.4MB/s

Parity with the kernel is not a goal since userspace will be a bit slower, but I would like to aim for something like ~50% instead of ~10%.

aramperes avatar Jan 08 '22 09:01 aramperes

Hello @aramperes , I may have encountered the same issue with smoltcp which have a lot smaller bandwidth than system network stack. Do you think the key problem is in smoltcp's design or there are any possible improvements could be done in application?

zonyitoo avatar Jan 19 '22 07:01 zonyitoo

@zonyitoo A few thoughts:

  • According to smoltcp's benchmarks, it should be able to handle multiple Gbps, so I doubt the core libary is the main issue
  • Tokio's design clashing with smoltcp might be however; the Device implementation needs to read from a local buffer, and so there is some locking in onetun as well:

https://github.com/aramperes/onetun/blob/648154b5ee89b4311f24a4827847ead765e8533f/src/virtual_device.rs#L28-L41

  • The Bus design in onetun was to simplify the code and to make it more extendable (for example for pcap, reverse tunneling, etc.). Since every consumer has to read each message before going to the next one, I'm sure targeted channels between the components would be way more efficient. That's how I wrote onetun originally; however the amount of channels needed between the different parts was getting out of hand. I may tweak the Bus design to allow for direct connections with that in mind.

aramperes avatar Jan 19 '22 17:01 aramperes

Would this benefit from removing Tokio completely and switching to crossbeam? That should keep the architecture intact, but I don’t know how much that would improve performance.

Alternatively, crossfire should be an almost drop in replacement for Tokio’s broadcast while not having to remove the async runtime. It boasts higher speeds on its readme compared to even Tokio’s single consumer channels.

jkcoxson avatar Jun 24 '22 22:06 jkcoxson

Moving away from Tokio/async may indeed be an option, however I would need to see clear benefits/benchmarks before making that decision. Maybe it's something worth testing in a branch. Either way, I don't think this would solve the locking issue in the Device implementation, which I suspect is the main bottleneck.

crossfire doesn't seem maintained enough for me to make the switch, and hasn't been udpated past Tokio 0.2.

aramperes avatar Jun 25 '22 17:06 aramperes