whoisit icon indicating copy to clipboard operation
whoisit copied to clipboard

Connection pooling (session reuse) not working?

Open askkemp opened this issue 1 year ago • 1 comments

It appears that sessions are being recreated for each RDAP query instead of using the previously established connection. It does work as intended for bootstrapping. The setting of http_pool_connections (see here) and the original implementation to allow for session reuse is in https://github.com/meeb/whoisit/pull/2.

The below compares the behavior when querying three domains between the latest commit, release v2.4.1, and performing the queries directly with requests.

At commit 80dc563 release v2.4.1

Notice how the bootstrapping creates one session for data.iana.org and then the rest use that session. And then it does the same for rdap.publicinterestregistry.org.

(.venv_80dc563) user@box:~/whoisit-v2.4.1$ python3.12 testing.py
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): data.iana.org:443
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/asn.json HTTP/1.1" 200 1311
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/dns.json HTTP/1.1" 200 8413
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/ipv4.json HTTP/1.1" 200 807
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/ipv6.json HTTP/1.1" 200 357
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/object-tags.json HTTP/1.1" 200 341
...
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): rdap.publicinterestregistry.org:443
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/cnn.org HTTP/1.1" 200 8593
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/facebook.org HTTP/1.1" 200 10549
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/reddit.org HTTP/1.1" 200 9904

At latest commit 565350a

Notice how the bootstrapping creates one session for data.iana.org and then the rest use that session. But now it creates a new session for each connection to rdap.publicinterestregistry.org.

(.venv) user@box:~/whoisit-v3.0.4$ python3.12 testing.py
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): data.iana.org:443
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/asn.json HTTP/11" 200 1311
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/dns.json HTTP/11" 200 8413
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/ipv4.json HTTP/11" 200 807
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/ipv6.json HTTP/11" 200 357
DEBUG:urllib3.connectionpool:https://data.iana.org:443 "GET /rdap/object-tags.json HTTP/11" 200 341

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): rdap.publicinterestregistry.org:443
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/cnn.org HTTP/11" 200 8593
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): rdap.publicinterestregistry.org:443
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/facebook.org HTTP/11" 200 10549
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): rdap.publicinterestregistry.org:443
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/reddit.org HTTP/11" 200 9907
import whoisit
import logging
log = logging.getLogger('urllib3')
log.setLevel(logging.DEBUG)
logging.basicConfig(level=logging.WARN)
import os
os.environ['DEBUG'] = 'true'

whoisit.bootstrap()
whoisit.domain('cnn.org', allow_insecure_ssl=False, follow_related=False)
whoisit.domain('facebook.org', allow_insecure_ssl=False, follow_related=False)
whoisit.domain('reddit.org', allow_insecure_ssl=False, follow_related=False)

Querying directly with requests

For this test, I queried two different RDAP providers. It appears rdap.verisign.com forces the connection closed causing a new connection to be created. In comparison, rdap.publicinterestregistry.org does not terminate the session allowing future queries to utilize the same session.

(.venv_current) user@box:~/$ python3.12 testing.py
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): rdap.verisign.com:443
DEBUG:urllib3.connectionpool:https://rdap.verisign.com:443 "GET /com/v1/domain/cnn.com HTTP/11" 200 2165
DEBUG:urllib3.connectionpool:Resetting dropped connection: rdap.verisign.com
DEBUG:urllib3.connectionpool:https://rdap.verisign.com:443 "GET /com/v1/domain/facebook.com HTTP/11" 200 2405
DEBUG:urllib3.connectionpool:Resetting dropped connection: rdap.verisign.com
DEBUG:urllib3.connectionpool:https://rdap.verisign.com:443 "GET /com/v1/domain/reddit.com HTTP/11" 200 2429

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): rdap.publicinterestregistry.org:443
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/cnn.org HTTP/11" 200 8593
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/facebook.org HTTP/11" 200 10549
DEBUG:urllib3.connectionpool:https://rdap.publicinterestregistry.org:443 "GET /rdap/domain/reddit.org HTTP/11" 200 9907
import requests
import logging
logging.basicConfig(level=logging.DEBUG)

s = requests.Session()
s.mount('https://', requests.adapters.HTTPAdapter(pool_connections=10))
s.get('https://rdap.verisign.com/com/v1/domain/cnn.com')
s.get('https://rdap.verisign.com/com/v1/domain/facebook.com')
s.get('https://rdap.verisign.com/com/v1/domain/reddit.com')

s.get('https://rdap.publicinterestregistry.org/rdap/domain/cnn.org')
s.get('https://rdap.publicinterestregistry.org/rdap/domain/facebook.org')
s.get('https://rdap.publicinterestregistry.org/rdap/domain/reddit.org')

askkemp avatar Dec 21 '24 08:12 askkemp

Ah, you might be correct here. Thanks for the issue, I'll look into it before the next release.

meeb avatar Dec 22 '24 07:12 meeb