async-io icon indicating copy to clipboard operation
async-io copied to clipboard

Async::IO::Endpoint#connect hangs on Windows

Open emiltin opened this issue 3 years ago • 10 comments

Hi, We have an issue with Async::IO::Endpoint#connect on Windows.

On Mac, the methods returns with Errno 61 Connection refused if the server does not respond. But on Windows, it blocks forever if the server does respond within the first 2 seconds.

If server is started first and then client.rb, it works on both Mac and Windows. But if client.rb is started first, and server.rb is started 2 seconds later or more, it only works on Mac. On Windows, client.rb hangs and never connects.

client.rb:

require 'async'
require 'async/io'

Async do |task|
  endpoint = Async::IO::Endpoint.tcp('127.0.0.1', 13111)
  loop do
    puts "connecting to server"
    endpoint.connect
    puts 'connected'
    exit
  rescue StandardError => e
    task.sleep 1
  end
end

server.rb:

require 'async'
require 'async/io'

Async do
  endpoint = Async::IO::Endpoint.tcp('0.0.0.0', 13111)
  tasks = endpoint.accept do |socket|  # creates async tasks
    puts "client connected"
    exit
  end
  puts "waiting for client"
  tasks.each { |task| task.wait }
end

It might be related to TCP Retransmissions being used on Windows, but not on Mac. See https://github.com/rsmp-nordic/rsmp/issues/22

emiltin avatar Feb 04 '22 12:02 emiltin

@otterdahl

emiltin avatar Feb 04 '22 12:02 emiltin

I guess you should be able to reproduce this without async, which might be good to create a minimal repro, maybe a bug in Windows/CRuby.

ioquatix avatar Feb 04 '22 22:02 ioquatix

i'll try that, thanks

emiltin avatar Feb 05 '22 16:02 emiltin

the following straighforward way of connecting works fine on both mac, linux and windows. does async use a particular method of connecting that i could use to try to reproduce the issue on windows without async?

require 'socket'

# client
client_thread = Thread.new do
  5.times do
    puts "client: connecting"
    socket = TCPSocket.new 'localhost', 13111
    puts "client: connected"
    break
  rescue StandardError => e
    puts "could not connect: #{e.inspect}"
    sleep 1
  end
end

# server
server_thread = Thread.new do
  puts "server: delay before starting"
  sleep 3
  server = TCPServer.new 13111
  puts "server: waiting for client"
  client = server.accept 
  puts "server: client connected"
end

server_thread.join
client_thread.join

puts "done"

emiltin avatar Feb 06 '22 18:02 emiltin

The following, which doesn't use Async, works on ubuntu and mac, but fails on windows:

require 'socket'

client_thread = Thread.new do
  loop do
    puts "client: trying to connect to server"
    socket = Socket.new Socket::AF_INET, Socket::SOCK_STREAM
    socketaddr = Socket.pack_sockaddr_in 12111, '127.0.0.1'
    socket.connect_nonblock socketaddr, exception: true
    puts "client: connected to server"
    break
  rescue StandardError => e
    puts "client: error while connecting: #{e.inspect}"
    sleep 1
  end
end

server_thread = Thread.new do
  delay = 4
  puts "server: initial delay of #{delay}s"
  sleep delay

  server = TCPServer.new 12111
  puts 'server: waiting for client to connect'
  client = server.accept 
  puts "server: client connected - success"
  exit
end

timeout_thread = Thread.new do
  timeout = 10
  sleep timeout
  puts "timout: client didn't connect within #{timeout}s - failure"
  exit 1
end

timeout_thread.join

So it seems this is a problem in Ruby on Windows, not Async.

I can invite to to a repo if you want to have a look at the github actions results. It also includes a script that uses Async, which also fails on Windows.

emiltin avatar Feb 10 '22 09:02 emiltin

I would like to address this but don't have the capacity right now. Can we park this discussion for now, at least, w.r.t. going deeper. I appreciate all the notes and reproduction scripts, and I believe later in the year, hopefully we can start thinking about Windows support, it might make sense for us to use Windows IO Uring support and try to address this correctly with proper non-blocking IO.

Alternatively, if you want to have a go at making io-event support Windows IO URing, it would be an awesome starting point for a contribution, even if it was just a rough PoC.

ioquatix avatar Feb 18 '22 09:02 ioquatix

I have some code which triggers this error on Windows, but not not on Mac or Linux. Could this be related?

An established connection was aborted by the software in your host machine. in task: #<Async::Task:0x884 reader (running)>
<internal:io>:63:in `read_nonblock'
D:/a/rsmp_validator/rsmp_validator/vendor/bundle/ruby/3.1.0/gems/async-io-1.33.0/lib/async/io/generic.rb:216:in `async_send'
D:/a/rsmp_validator/rsmp_validator/vendor/bundle/ruby/3.1.0/gems/async-io-1.33.0/lib/async/io/generic.rb:69:in `block in wrap_blocking_method'
D:/a/rsmp_validator/rsmp_validator/vendor/bundle/ruby/3.1.0/gems/async-io-1.33.0/lib/async/io/stream.rb:261:in `fill_read_buffer'
D:/a/rsmp_validator/rsmp_validator/vendor/bundle/ruby/3.1.0/gems/async-io-1.33.0/lib/async/io/stream.rb:131:in `read_until'
D:/a/rsmp_validator/rsmp_validator/vendor/bundle/ruby/3.1.0/gems/async-io-1.33.0/lib/async/io/protocol/line.rb:51:in `read_line'

emiltin avatar Sep 30 '22 08:09 emiltin

Is "An established connection was aborted by the software in your host machine" a message you print out or is it from the OS?

ioquatix avatar Sep 30 '22 10:09 ioquatix

It's not a text I've defined, I think it's the message of an exception coming from async-io?

emiltin avatar Sep 30 '22 12:09 emiltin

but ruby on windows is not yet expected to support non-blocking io, right?

emiltin avatar Sep 30 '22 12:09 emiltin