luasocket icon indicating copy to clipboard operation
luasocket copied to clipboard

Non-timeout dual-stack connections fail for IPv4-only servers due to missing fallback

Open Erlonealpha opened this issue 5 months ago • 4 comments

The following lines will directly return timeout in Non-timeout without attempting additional addresses: https://github.com/lunarmodules/luasocket/blob/4844a48fbf76b0400fd7b7e4d15d244484019df1/src/inet.c#L421

According to RFC 8305, the current getaddrinfo function will return both AF_INET6 and AF_INET addresses when dual-stack is enabled.

Issue: When a dual-stack client attempts to connect in Non-timeout mode to a server listening only on an IPv4 address, the connection becomes unreachable.

Recommendation: Add more options to control address selection during connect.

Erlonealpha avatar Jul 09 '25 07:07 Erlonealpha

Thanks for the clear report and supporting docs. Unfortunately I don't have much time to develop this right now, but PRs are welcome. Also if there are other pending PRs that affect you in any way if you can help review those, I can move them along. Some of them are difficult for me to test and confirmation from a 3rd party that the contributions work right or not would help move things along.

alerque avatar Jul 09 '25 09:07 alerque

Thank you for your response. I've already prepared a solution and will submit a PR later. In summary, the solution involves: Adding address family (default nil) and fallback (default true) options Letting the caller control address polling logic in non-timeout mode. Addendum: I'm still deciding whether to add these options directly as parameters or encapsulate them in a dedicated configuration table. What would be your preferred approach?

Erlonealpha avatar Jul 09 '25 11:07 Erlonealpha

I'm still deciding whether to add these options directly as parameters or encapsulate them in a dedicated configuration table. What would be your preferred approach?

Given the proliferation of options we're starting to hit I'm wondering if new ones should start using a table approach. I'd lean strongly that way with the caveat that we also need to think about backwards compatibility, so just moving existing options to a table isn't viable at this juncture, and mixing and matching isn't very pretty. How many more options do you think are likely to come up?

alerque avatar Jul 09 '25 13:07 alerque

I'm still deciding whether to add these options directly as parameters or encapsulate them in a dedicated configuration table. What would be your preferred approach?

Given the proliferation of options we're starting to hit I'm wondering if new ones should start using a table approach. I'd lean strongly that way with the caveat that we also need to think about backwards compatibility, so just moving existing options to a table isn't viable at this juncture, 和 mixing and matching isn't very pretty. How many more options do you think are likely to come up?

I tried this.

static int meth_connect(lua_State *L) {
    p_tcp tcp = (p_tcp) auxiliar_checkgroup(L, "tcp{any}", 1);
    const char *address =  luaL_checkstring(L, 2);
    const char *port = luaL_checkstring(L, 3);
    const char *err;
    if (!lua_isnoneornil(L, 4)) {
        struct addrinfo* user_ai = NULL;
        int ret = collect_addrinfo(L, 4, port, &user_ai);
        if (ret != 0) {
            /* this error is not included in the pierrors, now we just return it */
            err = "collect addrinfo error";
        } else {
            timeout_markstart(&tcp->tm);
            err = inet_tryconnect(&tcp->sock, &tcp->family, address, port,
                &tcp->tm, user_ai, 1);
        }
    } else {
        struct addrinfo connecthints;
        memset(&connecthints, 0, sizeof(connecthints));
        connecthints.ai_socktype = SOCK_STREAM;
        /* make sure we try to connect only to the same family */
        connecthints.ai_family = tcp->family;
        timeout_markstart(&tcp->tm);
        err = inet_tryconnect(&tcp->sock, &tcp->family, address, port,
            &tcp->tm, &connecthints, 0);
    }
    /* have to set the class even if it failed due to non-blocking connects */
    auxiliar_setclass(L, "tcp{client}", 1);
    if (err) {
        lua_pushnil(L);
        lua_pushstring(L, err);
        return 2;
    }
    lua_pushnumber(L, 1);
    return 1;
}

And test lua.

local function test_async_tcpconnect(option)
    option = option or {}
    option.tm = option.tm or {total = -1}
    option.yield = option.yield or coroutine.yield
    local sock = socket.tcp()
    local addrinfo = socket.dns.getaddrinfo("www.google.com")
    local ok, err
    for _, addr in ipairs(addrinfo) do
        if addr.family == 'inet6' then
            -- do something here ?
        end
        timeout_markstart(option.tm)
        ok, err = sock:connect("www.google.com", 80, {addr})
        if ok == 1 then
            return true, nil
        elseif err ~= "timeout" then
            return false, err
        end
        while true do
            local _, w, err_sel = socket.select(nil, {sock}, 0)
            if w[1] then
                return true, nil
            elseif err_sel ~= "timeout" then
                err = err_sel
                break
            else
                if yield_and_check_timeout(option.yield, option.tm) then
                    err = "timeout"
                    break
                end
            end
        end
    end
    return false, err
end

The test is ok for me.

Erlonealpha avatar Jul 10 '25 12:07 Erlonealpha