requests
requests copied to clipboard
Possible memory leaking when combining session, threading and proxies
I it helps I got error OSError: [Errno 24] Too many open files
when running script, not sure if it is related with memory leak, I solved setting to 10000 ulimit -n 10000
Expected Result
RAM usage kept under reasonable limits
Actual Result
RAM usage doesn't stop growing
Reproduction Steps
I usually wouldn't be posting target website or the proxy credentials, but in this case I think they are needed for reproduce the bug.
import requests
from threading import Thread
from time import sleep
session = requests.Session()
from memory_profiler import profile
from random import randrange
finished = False
def get_proxy():
proxy = "http://lum-customer-hl_f53c879b-zone-static-session-" + str(randrange(999999)) + ":[email protected]:22225"
return {
"http": proxy,
"https": proxy
}
def make_request(url):
session.get(url, proxies=get_proxy())
def worker():
while True:
if finished: return
make_request("http://1000imagens.com/")
@profile
def main():
global finished
threads = []
for i in range(2):
t = Thread(target=worker)
t.start()
threads.append(t)
count = 0
while True:
sleep(1)
count += 1
if count == 300:
finished = True
return
main()
System Information
$ python3.9 -m requests.help
{
"chardet": {
"version": "3.0.4"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "2.6"
},
"implementation": {
"name": "CPython",
"version": "3.9.1"
},
"platform": {
"release": "4.15.0-134-generic",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.25.1"
},
"system_ssl": {
"version": "1010100f"
},
"urllib3": {
"version": "1.22"
},
"using_pyopenssl": false
}
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
I tried with python versions 3.6, 3.8 and 3.9 and found no difference.
Output of memory_profiler
Line # Mem usage Increment Occurences Line Contents
============================================================
31 23.8 MiB 23.8 MiB 1 @profile
32 def main():
33 global finished
34 23.8 MiB 0.0 MiB 1 threads = []
35 23.8 MiB 0.0 MiB 3 for i in range(2):
36 23.8 MiB 0.0 MiB 2 t = Thread(target=worker)
37 23.8 MiB 0.0 MiB 2 t.start()
38 23.8 MiB 0.0 MiB 2 threads.append(t)
39
40 23.8 MiB 0.0 MiB 1 count = 0
41 while True:
42 547.1 MiB 523.2 MiB 300 sleep(1)
43 547.1 MiB 0.0 MiB 300 count += 1
44 547.1 MiB 0.0 MiB 300 if count == 300:
45 547.1 MiB 0.0 MiB 1 finished = True
46 547.1 MiB 0.0 MiB 1 return
After 5 minutes it eats +500MB ram. If I leave it running indefinitely it would consume all available ram and would be killed.
If I add verify=False
to same script it doesn't leak, so it seems related to SSL verification
Line # Mem usage Increment Occurences Line Contents
============================================================
31 23.9 MiB 23.9 MiB 1 @profile
32 def main():
33 global finished
34 23.9 MiB 0.0 MiB 1 threads = []
35 24.2 MiB 0.0 MiB 3 for i in range(2):
36 24.1 MiB 0.0 MiB 2 t = Thread(target=worker)
37 24.2 MiB 0.3 MiB 2 t.start()
38 24.2 MiB 0.0 MiB 2 threads.append(t)
39
40 24.2 MiB 0.0 MiB 1 count = 0
41 while True:
42 67.5 MiB 43.3 MiB 300 sleep(1)
43 67.5 MiB 0.0 MiB 300 count += 1
44 67.5 MiB 0.0 MiB 300 if count == 300:
45 67.5 MiB 0.0 MiB 1 finished = True
46 67.5 MiB 0.0 MiB 1 return
Yes. Every report of a memory leak we've had has been related to using TLS. We've never been able to track it further than the SSL library
when using random proxy, session.get_adapter("http://").proxy_manager dnot remove ProxyManager Object. too many ProxyManger object to memory leaking. session = requests.session() for x in range(1, 100): try: session.get("http://test.comaaa", proxies={"http": "http://{}:{}".format(x,x)}, timeout=0.1) except: continue print(session.get_adapter("http://").proxy_manager)
+1 same issue here
+1 same issue here
when using random proxy, session.get_adapter("http://").proxy_manager dnot remove ProxyManager Object. too many ProxyManger object to memory leaking. session = requests.session() for x in range(1, 100): try: session.get("http://test.comaaa", proxies={"http": "http://{}:{}".format(x,x)}, timeout=0.1) except: continue print(session.get_adapter("http://").proxy_manager)
sure, at this method requests.adapters.HTTPAdapter.proxy_manager_for() when using proxy, manager = self.proxy_manager[proxy] = proxy_from_url(...),this is a cache, here every random proxy comes into self.proxy_manager(a dict), when using a session, this proxy_manager won't clear its' values and become bigger to leak memory. To solve this, we need to pop values in it manually?
when using random proxy, session.get_adapter("http://").proxy_manager dnot remove ProxyManager Object. too many ProxyManger object to memory leaking. session = requests.session() for x in range(1, 100): try: session.get("http://test.comaaa", proxies={"http": "http://{}:{}".format(x,x)}, timeout=0.1) except: continue print(session.get_adapter("http://").proxy_manager)
sure, at this method requests.adapters.HTTPAdapter.proxy_manager_for() when using proxy, manager = self.proxy_manager[proxy] = proxy_from_url(...),this is a cache, here every random proxy comes into self.proxy_manager(a dict), when using a session, this proxy_manager won't clear its' values and become bigger to leak memory. To solve this, we need to pop values in it manually?
Here's my solution: use self.session = requests.sessions.Session() to handle cookies for website's login, use with self.session.get(url, headers=headers, proxies=self.proxies, ...) as self.response: to ensure response closed after request, and then at the method that changes the self.proxies, use self.session.get_adapter("https://").proxy_manager.clear() to clear the proxy_maneger's cache. This works for me.