requests icon indicating copy to clipboard operation
requests copied to clipboard

Possible memory leaking when combining session, threading and proxies

Open jaimecoj opened this issue 4 years ago • 7 comments

I it helps I got error OSError: [Errno 24] Too many open files when running script, not sure if it is related with memory leak, I solved setting to 10000 ulimit -n 10000

Expected Result

RAM usage kept under reasonable limits

Actual Result

RAM usage doesn't stop growing

Reproduction Steps

I usually wouldn't be posting target website or the proxy credentials, but in this case I think they are needed for reproduce the bug.

import requests
from threading import Thread
from time import sleep

session = requests.Session()
from memory_profiler import profile
from random import randrange
finished = False


def get_proxy():
    proxy = "http://lum-customer-hl_f53c879b-zone-static-session-" + str(randrange(999999)) + ":[email protected]:22225"
    return {
        "http": proxy,
        "https": proxy
    }


def make_request(url):
    session.get(url, proxies=get_proxy())

def worker():
    while True:
        if finished: return
        make_request("http://1000imagens.com/")


@profile
def main():
    global finished
    threads = []
    for i in range(2):
        t = Thread(target=worker)
        t.start()
        threads.append(t)

    count = 0
    while True:
        sleep(1)
        count += 1
        if count == 300:
            finished = True
            return

main()

System Information

$ python3.9 -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.9.1"
  },
  "platform": {
    "release": "4.15.0-134-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.25.1"
  },
  "system_ssl": {
    "version": "1010100f"
  },
  "urllib3": {
    "version": "1.22"
  },
  "using_pyopenssl": false
}
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic

I tried with python versions 3.6, 3.8 and 3.9 and found no difference.

Output of memory_profiler

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    31     23.8 MiB     23.8 MiB           1   @profile
    32                                         def main():
    33                                             global finished
    34     23.8 MiB      0.0 MiB           1       threads = []
    35     23.8 MiB      0.0 MiB           3       for i in range(2):
    36     23.8 MiB      0.0 MiB           2           t = Thread(target=worker)
    37     23.8 MiB      0.0 MiB           2           t.start()
    38     23.8 MiB      0.0 MiB           2           threads.append(t)
    39
    40     23.8 MiB      0.0 MiB           1       count = 0
    41                                             while True:
    42    547.1 MiB    523.2 MiB         300           sleep(1)
    43    547.1 MiB      0.0 MiB         300           count += 1
    44    547.1 MiB      0.0 MiB         300           if count == 300:
    45    547.1 MiB      0.0 MiB           1               finished = True
    46    547.1 MiB      0.0 MiB           1               return

After 5 minutes it eats +500MB ram. If I leave it running indefinitely it would consume all available ram and would be killed.

jaimecoj avatar Jan 22 '21 08:01 jaimecoj

If I add verify=False to same script it doesn't leak, so it seems related to SSL verification

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    31     23.9 MiB     23.9 MiB           1   @profile
    32                                         def main():
    33                                             global finished
    34     23.9 MiB      0.0 MiB           1       threads = []
    35     24.2 MiB      0.0 MiB           3       for i in range(2):
    36     24.1 MiB      0.0 MiB           2           t = Thread(target=worker)
    37     24.2 MiB      0.3 MiB           2           t.start()
    38     24.2 MiB      0.0 MiB           2           threads.append(t)
    39
    40     24.2 MiB      0.0 MiB           1       count = 0
    41                                             while True:
    42     67.5 MiB     43.3 MiB         300           sleep(1)
    43     67.5 MiB      0.0 MiB         300           count += 1
    44     67.5 MiB      0.0 MiB         300           if count == 300:
    45     67.5 MiB      0.0 MiB           1               finished = True
    46     67.5 MiB      0.0 MiB           1               return

jaimecoj avatar Jan 22 '21 08:01 jaimecoj

Yes. Every report of a memory leak we've had has been related to using TLS. We've never been able to track it further than the SSL library

sigmavirus24 avatar Jan 23 '21 14:01 sigmavirus24

when using random proxy, session.get_adapter("http://").proxy_manager dnot remove ProxyManager Object. too many ProxyManger object to memory leaking. session = requests.session() for x in range(1, 100): try: session.get("http://test.comaaa", proxies={"http": "http://{}:{}".format(x,x)}, timeout=0.1) except: continue print(session.get_adapter("http://").proxy_manager)

shukai avatar Jul 12 '21 10:07 shukai

+1 same issue here

timreibe avatar Jan 19 '22 09:01 timreibe

+1 same issue here

ll125498a avatar Jun 17 '22 11:06 ll125498a

when using random proxy, session.get_adapter("http://").proxy_manager dnot remove ProxyManager Object. too many ProxyManger object to memory leaking. session = requests.session() for x in range(1, 100): try: session.get("http://test.comaaa", proxies={"http": "http://{}:{}".format(x,x)}, timeout=0.1) except: continue print(session.get_adapter("http://").proxy_manager)

sure, at this method requests.adapters.HTTPAdapter.proxy_manager_for() when using proxy, manager = self.proxy_manager[proxy] = proxy_from_url(...),this is a cache, here every random proxy comes into self.proxy_manager(a dict), when using a session, this proxy_manager won't clear its' values and become bigger to leak memory. To solve this, we need to pop values in it manually?

yoursock avatar Nov 01 '23 03:11 yoursock

when using random proxy, session.get_adapter("http://").proxy_manager dnot remove ProxyManager Object. too many ProxyManger object to memory leaking. session = requests.session() for x in range(1, 100): try: session.get("http://test.comaaa", proxies={"http": "http://{}:{}".format(x,x)}, timeout=0.1) except: continue print(session.get_adapter("http://").proxy_manager)

sure, at this method requests.adapters.HTTPAdapter.proxy_manager_for() when using proxy, manager = self.proxy_manager[proxy] = proxy_from_url(...),this is a cache, here every random proxy comes into self.proxy_manager(a dict), when using a session, this proxy_manager won't clear its' values and become bigger to leak memory. To solve this, we need to pop values in it manually?

Here's my solution: use self.session = requests.sessions.Session() to handle cookies for website's login, use with self.session.get(url, headers=headers, proxies=self.proxies, ...) as self.response: to ensure response closed after request, and then at the method that changes the self.proxies, use self.session.get_adapter("https://").proxy_manager.clear() to clear the proxy_maneger's cache. This works for me.

yoursock avatar Nov 02 '23 02:11 yoursock