RAM increase slowly
first of all. thanks for your work. when I use the grequests the RAM will increase slowly. code such as :
def exception_handler(request, exception):
print("Request failed request:{} \n exception:{} ".format(request,exception))
if __name__ == '__main__':
task = []
f_file= "./data_scp/3031_xiuxiu_coverImage_v1.dat"
session = requests.session()
with open(f_file,"r") as r_f:
for i in r_f:
tmp = i.strip("\n").split(",")
url = tmp[-1]
feed_id = tmp[0]
rs = grequests.request("GET", url,session=session)
task.append(rs)
resp = grequests.imap(task, size=30,exception_handler=exception_handler)
for i in resp:
if i.status_code ==200:
print(i.status_code)
the 3031_xiuxiu_coverImage_v1.dat such as 6650058925696645684,http://***8.jpg 6650058925696645684,http://***8.jpg 6650058925696645684,http://***8.jpg 6650058925696645684,http://***8.jpg
my grequest version is 0.4.0 . thanks in advance
Thanks for reporting this issue. I'll try to see if I can reproduce this issue and figure out where memory is building up.
If you feel inclined, you can try profiling your own application, for example using memory-profiler which may be able to tell you where building up memory.
Though, memory increase should probably be somewhat expected as you take in response data. I assume you mean it's a very large or unexpected buildup of mem :)
thanks for your reply. my data size is (100000, ). the RAM build up from 400MB to 5000MB. and still increace. the profile:
Line # Mem usage Increment Line Contents
================================================
10 49.098 MiB 49.098 MiB @profile
11 def test():
12 49.098 MiB 0.000 MiB with open(f_file,"r") as r_f:
13 425.777 MiB 0.258 MiB for i in r_f:
14 425.777 MiB 0.258 MiB tmp = i.strip("\n").split(",")
15 425.777 MiB 0.258 MiB url = tmp[-1]
16 425.777 MiB 0.223 MiB feed_id = tmp[0]
17 425.777 MiB 0.258 MiB rs = grequests.request("GET", url,session=session)
18 425.777 MiB 0.773 MiB task.append(rs)
19
20 425.777 MiB 0.000 MiB resp = grequests.imap(task, size=30,exception_handler=exception_handler)
21
22 3647.770 MiB 5.512 MiB for i in resp:
23 3647.758 MiB 0.227 MiB if i.status_code ==200:
24 3647.758 MiB 0.184 MiB print(i.status_code)
So, I'm getting closer to figuring out what is going on. Here's a few things I've discovered thus far...
grequests opening (and not closing) a new session each request prevents freeing of memory
Take the following code:
@profile
def test():
url = "https://httpbin.org/status/200"
reqs=[grequests.get(url) for _ in range(100)]
responses = grequests.imap(reqs, size=5)
for resp in responses:
...
print('ok') # memory should be freed by now
Notice that the memory builds up (104MiB) and is never really released, despite no (apparent) references existing anymore. The size will also get bigger if I increase the number of requests.
Line # Mem usage Increment Line Contents
================================================
5 35.977 MiB 35.977 MiB @profile
6 def test():
7 35.977 MiB 0.000 MiB url = "https://httpbin.org/status/200"
8 36.477 MiB 0.062 MiB reqs=[grequests.get(url) for _ in range(100)]
9 36.477 MiB 0.000 MiB responses = grequests.imap(reqs, size=10)
10 104.605 MiB 104.605 MiB for resp in responses:
11 104.605 MiB 0.000 MiB ...
12 104.613 MiB 0.008 MiB print('ok') # memory should be freed by now
But if I modify the function to use a requests.Session object for its session...
sesh = requests.Session()
@profile
def test():
url = "https://httpbin.org/status/200"
reqs=[grequests.get(url, session=sesh) for _ in range(500)]
responses = grequests.imap(reqs, size=5)
for resp in responses:
...
print('ok') # memory should be freed by now
With this change, there is not nearly as much buildup in memory. (the amount is partially dependent on the pool size used; bigger pool will buildup more memory).
Also, now that we're using a session, increasing the number of requests does not increase the amount of memory built up, either. It is the same for 100 or 500 requests.
Line # Mem usage Increment Line Contents
================================================
5 36.090 MiB 36.090 MiB @profile
6 def test():
7 36.090 MiB 0.000 MiB url = "https://httpbin.org/status/200"
8 36.090 MiB 0.000 MiB reqs=[grequests.get(url, session=sesh) for _ in range(500)]
9 36.090 MiB 0.000 MiB responses = grequests.imap(reqs, size=5)
10 42.051 MiB 42.051 MiB for resp in responses:
11 42.051 MiB 0.000 MiB ...
12 42.059 MiB 0.008 MiB print('ok') # memory should be freed by now
Memory not freed due to references in request list
Using the very first code example and profiling from the previous section, (which does not use a session) another issue with freeing memory is seen
10 104.605 MiB 104.605 MiB for resp in responses:
11 104.605 MiB 0.000 MiB ...
12 104.613 MiB 0.008 MiB print('ok') # memory should be freed by now
By the time print('ok') runs, the generator has been exhausted and it SHOULD have freed up memory, but it doesn't. This is because the request list is still holding onto references, preventing garbage collection.
adding del reqs allows the memory to be freed once the generator is exhausted.
@profile
def test():
url = "https://httpbin.org/status/200"
reqs=[grequests.get(url) for _ in range(100)]
responses = grequests.imap(reqs, size=5)
del reqs
for resp in responses:
...
print('ok') # memory should be freed by now
With the references from the request list removed, memory is now freed (more) properly.
Line # Mem usage Increment Line Contents
================================================
5 35.977 MiB 35.977 MiB @profile
6 def test():
7 35.977 MiB 0.000 MiB url = "https://httpbin.org/status/200"
8 36.477 MiB 0.062 MiB reqs=[grequests.get(url) for _ in range(100)]
9 36.477 MiB 0.000 MiB responses = grequests.imap(reqs, size=5)
10 36.477 MiB 0.000 MiB del reqs
11 104.176 MiB 104.176 MiB for resp in responses:
12 104.176 MiB 0.004 MiB ...
13 56.660 MiB 0.000 MiB print('ok') # memory should be freed by now
A yet remaining problem...
13 56.660 MiB 0.000 MiB print('ok') # memory should be freed by now
Notice that, while we are freeing some memory, not everything is freed up. Specifically we have 56 MiB at the end of this function, but it should be closer to the ~36 MiB we started with. This number increases with the number of requests. (with 500 requests, ~86 MiB will be left).
Since you're already using a session, I think whatever is holding on to this little bit of memory that's building up is causing your memory leak. I'm still working on figuring out exactly what that is!
I have a partial fix for the initial issues in #138 -- but I don't think that will help your situation. Still working on it!
I have no idea about it. when I del the request list . RAM can build up slowly. but still increase.
def main():
r = (grequests.get(item[1]) for item in feed_id_url) # item[1] means url
for idx, i in enumerate(grequests.imap( r, size=30)):
print(idx)
if i.status_code == 200:
try:
img = Image.open(BytesIO(i.content))
except Exception as e:
print(e)
continue
that can work. the RAM stop increasing. but when it run about idx == 1000000 it will stop. I mean ps -aux | grep python it still exist. but it stop at the ids == 1000000. the try except get nothing. it is very strange.
That does sound strange. Unfortunately, I have no idea why it would stop suddenly. I've tested locally with as many or more requests, and it never flat out stops.
I have run into similar strange issues in the past though. Perhaps considering updating/changing the version of gevent and/or the version of Python you're using and see if that changes anything. That's really just a guess, though.
@Usernamezhx might be too late of response, but can you try to manually release the HTTP connection pool after processing each response?
for idx, i in enumerate(grequests.imap( r, size=30)):
if i.status_code == 200:
i.raw._pool.close()
the reason you are hitting the limit at precisely 1,000,000 is due to ulimit -n - an upper limit of how many open file descriptors you are allowed by the operating system.
Since you are not releasing the TCP pool and TCP socket handle, the connection is left in the dangling state.
if you manually close it with response.raw._pool.close() it will release the TCP socket back to OS and you can hit as many connections as you want.
you can see how many open sockets you have by running lsof -i | grep python | grep https
also in your very first example, you have a variable task - that contains a list of all AsyncRequests.
Keep in mind that each AsyncRequest contains reference to the Response object, and each Response contains reference to the raw response, and HTTPS Connection Pool, and each pool will have TCP handle.
So as long as you carry around a list of AsyncRequests, the referenced Responses, and underlying TCP pools and connections will never be closed, garbage collected and returned to the operating system