pycurl icon indicating copy to clipboard operation
pycurl copied to clipboard

Curl.perform() blocks SIGINT during the start of a SOCKS transfer

Open fsbs opened this issue 4 years ago • 10 comments

Curl.perform() and CurlMulti.perform() can't be interrupted during DNS stage when a "socks5h://" proxy is set - i.e. when domain name is remotely resolved.

At the same time it is possible to interrupt in the following cases:

  • torsocks + pycurl-without-proxy
  • curl --proxy "socks5h://..."

You can try running the following examples, but you should hit Ctrl+C immediately to see the difference.

Example 1: pycurl.PROXY [not interruptible]

import pycurl
import random

# use new circuit each time to prevent caching by tor
PROXY = f'socks5h://pycurl:{random.randint(0, 1024)}@127.0.0.1:9050'

# onion.debian.org
URL = 'http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/'

c = pycurl.Curl()
c.setopt(pycurl.VERBOSE, 1)
c.setopt(pycurl.PROXY, PROXY)  # or pycurl.PRE_PROXY
c.setopt(pycurl.URL, URL)

def easy():
    c.perform()

def multi():
    m = pycurl.CurlMulti()
    m.add_handle(c)
    while m.perform()[1]:
        m.select(1.0)

#easy()
multi()

Example 2: torsocks + pycurl [interruptible]

Same as above, but without the c.setopt(pycurl.PROXY, PROXY) line. Run with torsocks:

torsocks --isolate python3 example2.py

This is interruptible, probably because pycurl isn't communicating to the socks proxy on its own, instead that is delegated to the torsocks wrapper without pycurl knowing anything about it. So the above issue is probably located in how pycurl handles proxies.

Example 3: curl --proxy [interruptible]

#!/bin/bash
PROXY="socks5h://curl:$(($RANDOM % 1024))@127.0.0.1:9050"

URL='http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/'

curl --proxy "$PROXY" --verbose "$URL"

This is the same as the first example, but with curl instead of pycurl. It is interruptible like expected, so the issue doesn't go as deep as the libcurl level - it is pycurl-specific.

Versions

  • pycurl: PycURL/7.44.1 libcurl/7.68.0 GnuTLS/3.6.13 zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
  • curl:
curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
  • python: Python 3.8.10

fsbs avatar Oct 07 '21 17:10 fsbs

Does it make any difference if you set the NOSIGNAL option?

swt2c avatar Oct 08 '21 13:10 swt2c

@swt2c No, that has no effect.

Same issue also with socket_action.

Might not be specifically related to remotely resolving domains - it's just that this is part of the first stage of a SOCKS connection.

As far as I can see pycurl delegates SOCKS logic to libcurl anyway, but this issue could be caused by pycurl's GIL logic instead. It's possible libcurl is invoking some additional callbacks when you use a proxy since additional steps are necessary to set up such a connection. Such callbacks is where the problems with GIL logic can occur, since you get nested calls leading back to pycurl at a different spot where wrong assumptions can be made about GIL state.

Note also that libcurl invokes callbacks mostly at the very start of a transfer, which is when this issue occurs.

fsbs avatar Oct 10 '21 04:10 fsbs

My bad, there was a regression regarding SOCKS proxies in libcurl itself, already fixed in 7.71.0: https://github.com/curl/curl/issues/5710#issuecomment-663007894

Those fixes take care of multi.perform and multi.socket_action. (I'll also note they also fixed socket_action being a blocking call during SOCKS kickstart, so SOCKS transfers now play nice with async event loops.)

However easy.perform() still blocks SIGINT when a SOCKS proxy is used. Not important for me personally, but I'll leave the issue open and change the title accordingly.

fsbs avatar Oct 12 '21 02:10 fsbs

However easy.perform() still blocks SIGINT when a SOCKS proxy is used. Not important for me personally, but I'll leave the issue open and change the title accordingly.

Are you sure that it is pycurl doing that and not libcurl?

swt2c avatar Oct 12 '21 13:10 swt2c

@swt2c Doesn't seem so because of example 3, which doesn't block SIGINT. CLI curl uses curl_easy_perform by default, unless it's run with --parallel. But it probably sets some additional easy opts compared to example 1, so I can't say for certain.

When I have some spare time I'll rewrite example 1 in libcurl.

fsbs avatar Oct 12 '21 17:10 fsbs

Here's a libcurl example and its pycurl equivalent, setting a SOCKS proxy via CURLOPT_PROXY.

The libcurl one terminates immediately on SIGINT. The pycurl one raises KeyboardInterrupt only after perform() returns.

libcurl

#include <curl/curl.h>
#include <stdio.h>

int main(void)
{
    printf("%s\n", curl_version());
    CURL *curl = curl_easy_init();

    curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
    curl_easy_setopt(curl, CURLOPT_URL, "http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/");
    curl_easy_setopt(curl, CURLOPT_PROXY, "socks5h://127.0.0.1:9050");

    /* doesn't block SIGINT */
    curl_easy_perform(curl);

    curl_easy_cleanup(curl);
    return 0;
}

pycurl

import pycurl

print(pycurl.version)
curl = pycurl.Curl()

curl.setopt(pycurl.VERBOSE, 1)
curl.setopt(pycurl.URL, 'http://jvgypgbnfyvfopg5msp6nwr2sl2fd6xmnguq35n7rfkw3yungjn2i4yd.onion/')
curl.setopt(pycurl.PROXY, 'socks5h://127.0.0.1:9050')

# blocks SIGINT
curl.perform()

curl.close()

I've seen this happen only when a SOCKS proxy is set via pycurl.

Some other means of proxifying the same Python script doesn't have this issue, for example when removing the pycurl.PROXY line and using torsocks wrapper instead: torsocks python3 example.py.

I've also tested with different SSL libraries (openssl, gnutls, nss) when building pycurl and the above libcurl example, and it made no difference.

fsbs avatar Nov 03 '21 10:11 fsbs

If the wait is inside libcurl then I can suggest experimenting with the NOSIGNAL option and trying a non-blocking dns resolver (c-ares/threaded?).

p avatar Jan 11 '22 19:01 p

NOSIGNAL and non-blocking resolver make no difference on my end. I'm double-checking the presence of async resolver with:

print('ASYNCHDNS:', pycurl.version_info()[4] & pycurl.VERSION_ASYNCHDNS)

I also tried removing whatever setopts pycurl does internally as well as the GIL code in do_curl_perform() (BEGIN/END_ALLOW_THREADS), and SIGINT still doesn't interrupt curl_easy_perform().

I can't see any other spot that could cause this issue. Could someone else test this with a SOCKS proxy?

fsbs avatar Jan 20 '22 09:01 fsbs

I suggest trying a modern libcurl version where the SOCKS connect procedure has been remade to be totally non-blocking.

bagder avatar Jan 31 '22 07:01 bagder

@bagder The issue affects pycurl only, not libcurl. What I described in https://github.com/pycurl/pycurl/issues/706#issuecomment-958907934 is still the case in the latest libcurl (7.81.0) and latest pycurl (7.44.1) release: interrupting curl_easy_perform() works in libcurl but not in pycurl. The pycurl build I tested with is based on the latest libcurl release.

fsbs avatar Jan 31 '22 18:01 fsbs