cqueues icon indicating copy to clipboard operation
cqueues copied to clipboard

starttls issue (unable to update event disposition: No such file or directory)

Open luveti opened this issue 4 years ago • 4 comments

This is an example from the documentation, with a snippet from daurnimator's lua-http library, and is the minimal code needed to reproduce the issue I'm seeing in my application.

local ce = require "cqueues.errno"
local cqueues = require "cqueues"
local socket = require "cqueues.socket"

local cq = cqueues.new()

-- copied from https://git.io/JYqpM
local function onerror(socket, op, why, lvl)
	local err = string.format("%s: %s", op, ce.strerror(why))
	if op == "starttls" then
		local ssl = socket:checktls()
		if ssl and ssl.getVerifyResult then
			local code, msg = ssl:getVerifyResult()
			if code ~= 0 then
				err = err .. ":" .. msg
			end
		end
	end
	if why == ce.ETIMEDOUT then
		if op == "fill" or op == "read" then
			socket:clearerr("r")
		elseif op == "flush" then
			socket:clearerr("w")
		end
	end
	return err, why
end

local function send_request()
	local http = socket.connect("google.com", 443)
	http:onerror(onerror)
	local ok, err, errno = http:starttls()
	if not ok then
		-- Note: calling http:close() here causes a different error to occur (Bad file descriptor)
		return nil, err, errno
	end
	http:write("GET / HTTP/1.0\n")
	http:write("Host: google.com:443\n\n")

	local status = http:read()
	print("!", status)
	for ln in http:lines "*h" do
		print("|", ln)
	end

	local empty = http:read "*L"
	print "~"

	for ln in http:lines "*L" do
		io.stdout:write(ln)
	end
	http:close()
end

cq:wrap(function()
	while true do
		print(send_request())
		cqueues.sleep(0.5)
	end
end)

cq:wrap(function()
	while true do
		print(send_request())
		cqueues.sleep(1)
	end
end)

print(cq:loop())

Note: The contents of onerror don't affect this issue, but give a nice error message.

Output:

nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
false   unable to update event disposition: No such file or directory (fd:28)   2       thread: 0xb6d75648      nil     28

The count of starttls lines varies, from one to about a dozen. Which leads me to believe something is getting garbage collected and a reference to the garbage collected object is being used?

luveti avatar Mar 26 '21 08:03 luveti

This code you linked works for me: it repeatedly (successfully) makes requests to google.com. Is there some other ingredient I need to reproduce?

daurnimator avatar Mar 27 '21 12:03 daurnimator

Hey @daurnimator, looks like I forgot to mention that the device is sitting behind a captive portal, which hasn't been "signed into" yet. I set up a raspberry pi 3 mobile b+ to be a router using RaspAP and Nodogsplash.

I should also mention that I'm running cqueues on a raspberry pi 4 model b (as part of a much larger project). I'm using the latest version of cqueues.

If you need hardware (or funds for some) we would be more than willing to donate. Trying to setup a captive portal on an old router is much harder than doing so on a raspberry pi!

luveti avatar Mar 29 '21 05:03 luveti

As a temporary work around, I ended up moving all my HTTP requests into threads (using a task queue I've had for a while). This worked pretty well for a while, as I could just let the thread die when the mentioned error occurred.

But after letting this run for a while I started to get "Too many open files" errors from various other places in my program. I ran the following lsof -c luajit | wc -l and noticed luajit was opening more and more files over time. I was able to reproduce this in the above example.

luveti avatar Mar 31 '21 08:03 luveti

I've been poking around at the internals of cqueues and I'm starting to think there may be a leak somewhere in the dns logic under so_open. If I pass an IP address into socket.connect the issue I've described goes away. Using dns.resolve to resolve the domain name doesn't appear to cause a leak. So an example that works:

local ce = require('cqueues.errno')
local cqueues = require('cqueues')
local dns = require('cqueues.dns')
local packet = require('cqueues.dns.packet')
local record = require('cqueues.dns.record')
local socket = require('cqueues.socket')

local cq = cqueues.new()

local function onerror(socket, op, why, lvl) -- luacheck: ignore 212
	local err = string.format("%s: %s", op, ce.strerror(why))
	if op == "starttls" then
		local ssl = socket:checktls()
		if ssl and ssl.getVerifyResult then
			local code, msg = ssl:getVerifyResult()
			if code ~= 0 then
				err = err .. ":" .. msg
			end
		end
	end
	if why == ce.ETIMEDOUT then
		if op == "fill" or op == "read" then
			socket:clearerr("r")
		elseif op == "flush" then
			socket:clearerr("w")
		end
	end
	return err, why
end

local function domain_to_ip_address(domain)
	local p, err_code = dns.query(domain, 'A')
	if not p then return nil, err_code end
	for r in p:grep({ section = packet.section.ANSWER, type = record.type.A }) do
		return r:addr()
	end
end

local function send_request()
	local ip, err = domain_to_ip_address('google.com')
	if not ip then
		print('failed to resolve domain name', err)
		return
	end

	local http = socket.connect(ip, 443)
	http:onerror(onerror)
	local ok, err, errno = http:starttls()
	if not ok then
		return nil, err, errno
	end
	http:write("GET / HTTP/1.0\n")
	http:write("Host: google.com:443\n\n")

	local status = http:read()
	print("!", status)
	for ln in http:lines "*h" do
		print("|", ln)
	end

	local empty = http:read "*L"
	print "~"

	for ln in http:lines "*L" do
		io.stdout:write(ln)
	end
	http:close()
end

cq:wrap(function()
	while true do
		print(pcall(function()
			print(send_request())
		end))
		cqueues.sleep(1)
	end
end)

print(cq:loop())

While poking around, I noticed defining SOCKET_DEBUG outputs some useful info. It appears socket.connect attempts to connect to both IPv4 and IPv6 addresses using the same file descriptor:

fd = 6
connect(google.com./[172.217.9.78]:443): Connection refused
fd = 6
connect(google.com./[2607:f8b0:4009:816::200e]:443): Network is unreachable
nil     starttls: Network is unreachable        101
true
fd = 6
connect(google.com./[172.217.9.78]:443): Connection refused
fd = 6
connect(google.com./[2607:f8b0:4009:816::200e]:443): Network is unreachable
nil     starttls: Network is unreachable        101
true
fd = 6
connect(google.com./[172.217.9.78]:443): Connection refused
fd = 6
connect(google.com./[2607:f8b0:4009:816::200e]:443): Network is unreachable
nil     starttls: Network is unreachable        101
true
false   unable to update event disposition: No such file or directory (fd:6)    2       thread: 0xb6d15ee8      nil6

NOTE: I've added printf("fd = %i\n", fd); to so_trace in src/lib/socket.c.

I wonder if starttls should even be called if the calls to connect fail?

luveti avatar Apr 04 '21 02:04 luveti