cqueues icon indicating copy to clipboard operation
cqueues copied to clipboard

Support TCP Fast Open

Open daurnimator opened this issue 10 years ago • 9 comments

TCP Fast Open (TFO) allows for the reduction of round trips while setting up a TCP connection by including data in the initial SYN packet.

On the server side, this means adding support for the SOL_TCP option TCP_FASTOPEN. This should be quite simple, and no api change around :accept() should be needed.

On the client side, we need to use sendto with MSG_FASTOPEN. Lua API wise, I suggest that if :write is called before :connect, we automatically use TFO when available (this can give a speed up to all applications!), possibly with an explicit option to turn it off.

Support

  • Linux: available but turned off for IPv4 by default since 3.6 (clients), 3.7 (servers). On by default since 3.13. IPv6 support added in 3.16.
  • Mac: Coming in iOS 9.0 and OSX 10.11

Useful links:

  • https://lwn.net/Articles/508865/
  • https://tools.ietf.org/html/rfc7413
  • https://bradleyf.id.au/nix/shaving-your-rtt-wth-tfo/

daurnimator avatar Feb 13 '15 06:02 daurnimator

This seems like a solution in search of a problem. TCP_CORK/TCP_NOPUSH already permits bundling data with the SYN packet. (On recent Linux kernels "autocorking" is the default.) The only edge TCP_FASTOPEN seems to have is that the server application can dequeue the data and reply before the ACK. The flip side to that coin is that a server may see two identical connections under some circumstances. ugh

That said, I see an opportunity to simplify things. In order to support this for SSL we'd need to instantiate a custom BIO. If we did that I think so_read and so_write could be simplified, removing much of the #ifdef and SIGPIPE conditionals. And it might even pave the way for server-side DTLS support.

As a means to an end I like the idea of supporting this, even though I think the feature sucks.

wahern avatar Jun 17 '15 16:06 wahern

TCP_CORK/TCP_NOPUSH already permits bundling data with the SYN packet.

It does? Doesn't it conceptually require a write before connect returns?

Or do they have something like if you:

  • set non-blocking
  • cork
  • connect (returning EINPROGRESS)
  • write
  • uncork

==> it sends data in the SYN? But this is still weird; as you'd be writing when the fd does not poll as writable. (connect can be polled for by detecting writability)

(On recent Linux kernels "autocorking" is the default.)

Ah! I didn't know this. That's cool :)

  • http://www.phoronix.com/scan.php?page=news_item&px=MTU4Mjk

In order to support this for SSL we'd need to instantiate a custom BIO. If we did that I think so_read and so_write could be simplified, removing much of the #ifdef and SIGPIPE conditionals.

That sounds good :)

daurnimator avatar Jun 18 '15 00:06 daurnimator

Yeah, I think I was mistaken. I was probably thinking of this:

By combining the TCP_CORK, TCP_DEFER_ACCEPT, and TCP_QUICKACK options, the number of packets participating in each HTTP transaction will be reduced to a minimal acceptable level (as required by TCP protocol requirements and security considerations). The result is not only fast data transfer and request processing but also minimized client-server two-way latency.

Source: http://www.techrepublic.com/article/take-advantage-of-tcp-ip-options-to-optimize-data-transmission/

wahern avatar Jun 18 '15 01:06 wahern

On OSX for client side, you use connectx with some new (as of iOS 9.0 or OSX 10.11) flags:

connectx(fd, ..., DATA_IDEMPOTENT | CONNECT_RESUME_ON_READ_WRITE, ...); // SYN delayed
write(fd, ...); // SYN goes out with first data segment

Source: http://devstreaming.apple.com/videos/wwdc/2015/719ui2k57m/719/719_your_app_and_next_generation_networks.pdf?dl=1 Page 84

I can't find much other information on it, the iOS 9.0 pre-release notes:

Added CONNECT_DATA_IDEMPOTENT Added CONNECT_RESUME_ON_READ_WRITE Added connectx(_: Int32, _: UnsafePointer<sa_endpoints_t>, _: sae_associd_t, _: UInt32, _: UnsafePointer, _: UInt32, _: UnsafeMutablePointer<Int>, _: UnsafeMutablePointer<sae_connid_t>) -> Int32

daurnimator avatar Jun 18 '15 02:06 daurnimator

@daurnimator I tried the new connectx() API on OS X 10.11 beta 5, only to find that it always returns with error Invalid argument. I tried different arguments, but no luck so far.

https://github.com/shadowsocks/shadowsocks/issues/399#issuecomment-126280454

Have you worked out how to pass valid arguments to connectx? Thanks!

clowwindy avatar Aug 12 '15 08:08 clowwindy

Have you worked out how to pass valid arguments to connectx? Thanks!

Sorry, I don't own any apple devices to develop with. (I only can rarely borrow a coworker's Mac for a quick smoke test); and hence don't have the unreleased OSX/iOS available to experiment with at all.

Looking at your code you seem to have gotten further than I can with the online docs I've found (e.g. I haven't found a definition of sa_endpoints_t; infact, your linked comment is the only result on google outside of apple)

daurnimator avatar Aug 12 '15 10:08 daurnimator

@clowwindy, sorry to hear about shutting down the shadowsocks project :(

your linked comment is the only result on google outside of apple

Google cache archive of the mention https://webcache.googleusercontent.com/search?q=cache:-CTj4hqsa5AJ:https://github.com/shadowsocks/shadowsocks/issues/399+&cd=1&hl=en&ct=clnk&gl=us I'm not sure how long google cache works for....

@clowwindy, where did you find out that info?

daurnimator avatar Aug 25 '15 04:08 daurnimator

@daurnimator I read man connectx on OS X 10.11 and wrote a demo. But it doesn't work.

Hope Apple will publish more information after the final release of 10.11.

clowwindy avatar Aug 25 '15 05:08 clowwindy

@clowwindy thanks. I see you even got your example working :) FWIW, I've mirrored it here

daurnimator avatar Sep 30 '15 01:09 daurnimator