aiohttp
aiohttp copied to clipboard
Memory leak when doing get() and scope closes
🐞 Describe the bug
When downloading content from a website, memory usage increases roughly proportionally to the size of the content downloaded, with some overhead for the library and other objects in memory.
When the scope where that content is downloaded is closed and inaccessible, I expect the garbage collector to de-allocate the objects which were downloaded, but memory usage appears to remain near the same level as peak memory usage right after downloads finish.
💡 To Reproduce
- Set up a default ClientSession
- Open a connection with any site
- Download content from that site
- Leave the scope in which that content was downloaded
- Examine memory usage during and after download, observing that memory usage has not significantly decreased
💡 Expected behavior Once the downloaded objects are no longer in scope, they should be de-allocated by the garbage collector and memory usage should fall back to a similar level as when the script initialized.
📋 Logs/tracebacks
I have no specific logs/tracebacks (no exception occurred), but I have profiled the relevant functions in two scripts with memory_profiler to demonstrate memory usage.
The first script is created with aiohttp, demonstrating the bug:
Filename: ./aiohttp_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
22 26.8 MiB 26.8 MiB 1 @profile
23 async def quicktest():
24 26.8 MiB 0.0 MiB 1 url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
25 26.8 MiB 0.0 MiB 1 session = aiohttp.ClientSession()
26 26.8 MiB 0.0 MiB 8 jobs = [grab_data(url, session) for _ in range(5)]
27 260.7 MiB 233.9 MiB 2 jobs_result = await asyncio.gather(*jobs)
28 260.7 MiB 0.0 MiB 1 if not all(jobs_result): # sanity check
29 _log.info("Something didn't give back data")
30 260.7 MiB 0.0 MiB 1 await session.close()
Filename: ./aiohttp_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
33 26.7 MiB 26.7 MiB 1 @profile
34 def main():
35 26.7 MiB 0.0 MiB 1 logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
36 26.8 MiB 0.1 MiB 1 loop = asyncio.get_event_loop()
37 26.8 MiB 0.0 MiB 1 try:
38 201.4 MiB 174.6 MiB 1 loop.run_until_complete(quicktest())
39 # memory usage should be back to normal
40 201.4 MiB 0.0 MiB 1 _log.info("Completed example")
41 except KeyboardInterrupt:
42 _log.info("shutting down...")
43 loop.stop()
The bug is observed here: On line 38, I expect memory usage to reach its peak, and the next line should see memory usage drop to init levels. However, on line 40, memory usage is still at the same level, even though the scope where the memory was allocated is gone and inaccessible.
The second is created using requests in a nearly identical way, but demonstrates the behavior I expect in aiohttp:
Filename: ./requests_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
19 23.9 MiB 23.9 MiB 1 @profile
20 def quicktest():
21 23.9 MiB 0.0 MiB 1 url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
22 23.9 MiB 0.0 MiB 1 session = requests.Session()
23 126.2 MiB 102.2 MiB 8 jobs_result = [grab_data(url, session) for _ in range(5)]
24 126.2 MiB 0.0 MiB 1 if not all(jobs_result): # sanity check
25 _log.info("Something didn't give back data")
26 126.2 MiB 0.0 MiB 1 session.close()
Filename: ./requests_example.py
Line # Mem usage Increment Occurences Line Contents
============================================================
29 23.9 MiB 23.9 MiB 1 @profile
30 def main():
31 23.9 MiB 0.0 MiB 1 logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
32 23.9 MiB 0.0 MiB 1 try:
33 27.4 MiB 3.5 MiB 1 quicktest()
34 # memory usage should be back to normal
35 27.4 MiB 0.0 MiB 1 _log.info("Completed example")
36 except KeyboardInterrupt:
37 _log.info("shutting down...")
Here the memory usage shown by memory_profiler is considerably different, because the memory usage drops as the function quicktest() on line 33 ends, so the scope of main() never sees the same level of memory usage.
📋 Your version of the Python
python --version
Python 3.8.3
📋 Your version of the aiohttp/yarl/multidict distributions
python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: attrs, chardet, multidict, async-timeout, yarl
Required-by:
$ python -m pip show multidict
(memory_leak_aiohttp) gregdan3@Theros ~/g/c/memory_leak_aiohttp (master)>
python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: multidict, idna
Required-by: aiohttp
📋 Additional context
The relevant scripts and my Pipfile* are here: scripts.zip
#!/usr/bin/env python
import asyncio
import logging
from concurrent.futures import ThreadPoolExecutor
import aiohttp
from memory_profiler import profile
_log = logging.getLogger(__name__)
LOG_FORMAT = (
"[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s] %(message)s"
)
async def grab_data(url, session):
async with session.get(url) as response:
data = await response.read()
return data
@profile
async def quicktest():
url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
session = aiohttp.ClientSession()
jobs = [grab_data(url, session) for _ in range(5)]
jobs_result = await asyncio.gather(*jobs)
if not all(jobs_result): # sanity check
_log.info("Something didn't give back data")
await session.close()
@profile
def main():
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(quicktest())
# memory usage should be back to normal
_log.info("Completed example")
except KeyboardInterrupt:
_log.info("shutting down...")
loop.stop()
if __name__ == "__main__":
main()
#!/usr/bin/env python
import logging
import requests
from memory_profiler import profile
_log = logging.getLogger(__name__)
LOG_FORMAT = (
"[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s] %(message)s"
)
def grab_data(url, session):
with session.get(url) as response:
data = response.content
return data
@profile
def quicktest():
url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
session = requests.Session()
jobs_result = [grab_data(url, session) for _ in range(5)]
if not all(jobs_result): # sanity check
_log.info("Something didn't give back data")
session.close()
@profile
def main():
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
try:
quicktest()
# memory usage should be back to normal
_log.info("Completed example")
except KeyboardInterrupt:
_log.info("shutting down...")
if __name__ == "__main__":
main()
And my Pipfile:
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
memory_profiler = {git = "ssh://[email protected]/pythonprofilers/memory_profiler.git"}
aiohttp = "*"
requests = "*"
I have not embedded my Pipfile.lock due to its length, but it is in the scripts.zip
above and demonstrates the same versions for my dependencies as I showed before.
All of my dependencies are listed above.
I created and tested the scripts above on my personal machine running Archlinux, but originally observed the bug that led me to create these scripts on a machine running CentOS.
uname --all
Linux HOSTNAME 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux
Linux HOSTNAME 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I'm having this exact same issue.
Any update on this? Would love to see this issue addressed!
experiencing the same issue
On Python 3.7.8 it's working fine, but with Python 3.8+ versions getting memory leak problem.
I hope someone solve this problem
Just did a check on my machine on 2a9d5e9e (master):
Filename: ./tests/4833_test.py
Line # Mem usage Increment Occurences Line Contents
============================================================
22 32.4 MiB 32.4 MiB 1 @profile
23 async def quicktest():
24 32.4 MiB 0.0 MiB 1 url = "http://localhost:8000/A17_FlightPlan.pdf"
25 32.4 MiB 0.0 MiB 1 session = aiohttp.ClientSession()
26 32.4 MiB 0.0 MiB 8 jobs = [grab_data(url, session) for _ in range(5)]
27 131.9 MiB 99.5 MiB 2 jobs_result = await asyncio.gather(*jobs)
28 131.9 MiB 0.0 MiB 1 if not all(jobs_result): # sanity check
29 _log.info("Something didn't give back data")
30 131.9 MiB 0.0 MiB 1 await session.close()
[2020-11-11 18:24:58,934] [ 4833_test.py:40 ] [ INFO] Completed example
Filename: ./tests/4833_test.py
Line # Mem usage Increment Occurences Line Contents
============================================================
33 32.2 MiB 32.2 MiB 1 @profile
34 def main():
35 32.4 MiB 0.2 MiB 1 logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
36 32.4 MiB 0.0 MiB 1 loop = asyncio.get_event_loop()
37 32.4 MiB 0.0 MiB 1 try:
38 33.2 MiB 0.8 MiB 1 loop.run_until_complete(quicktest())
39 # memory usage should be back to normal
40 33.2 MiB 0.0 MiB 1 _log.info("Completed example")
41 except KeyboardInterrupt:
42 _log.info("shutting down...")
43 loop.stop()
$ python --version
Python 3.8.5
$ uname -a
Linux 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 20.04.1 LTS
Release: 20.04
Codename: focal
Same PC on aiohttp 3.7.2
Line # Mem usage Increment Occurences Line Contents
============================================================
22 26.7 MiB 26.7 MiB 1 @profile
23 async def quicktest():
24 26.7 MiB 0.0 MiB 1 url = "http://localhost:8000/A17_FlightPlan.pdf"
25 26.7 MiB 0.0 MiB 1 session = aiohttp.ClientSession()
26 26.7 MiB 0.0 MiB 8 jobs = [grab_data(url, session) for _ in range(5)]
27 126.2 MiB 99.5 MiB 2 jobs_result = await asyncio.gather(*jobs)
28 126.2 MiB 0.0 MiB 1 if not all(jobs_result): # sanity check
29 _log.info("Something didn't give back data")
30 126.2 MiB 0.0 MiB 2 await session.close()
[2020-11-11 18:32:41,084] [ 4833_test.py:40 ] [ INFO] Completed example
Filename: /home/dmitry/Sources/aiohttp/tests/4833_test.py
Line # Mem usage Increment Occurences Line Contents
============================================================
33 26.7 MiB 26.7 MiB 1 @profile
34 def main():
35 26.7 MiB 0.0 MiB 1 logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
36 26.7 MiB 0.0 MiB 1 loop = asyncio.get_event_loop()
37 26.7 MiB 0.0 MiB 1 try:
38 27.5 MiB 0.8 MiB 1 loop.run_until_complete(quicktest())
39 # memory usage should be back to normal
40 27.5 MiB 0.0 MiB 1 _log.info("Completed example")
41 except KeyboardInterrupt:
42 _log.info("shutting down...")
43 loop.stop()
$ python --version
Python 3.8.5
$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: multidict, yarl, attrs, typing-extensions, chardet, async-timeout
Required-by:
$ python -m pip show multidict
Name: multidict
Version: 5.0.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.6.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp
aiohttp doesn't use weak references to break cycles between moving parts.
In practice, it means that the memory is released not immediately but at the next call of the garbage collector.
You can enforce the gc by pushing gc.collect()
just after loop.run_until_complete(quicktest())
.
Does it help?
@asvetlov in my environment it is not needed. In the main
function there is no memory leak.
May be it was a problem in older Python 3.8 or may be Python was patched somehow in a CentOS.
I will check later this leak in CentOS docker.
OK. BTW, for checking on different Python versions I use https://github.com/pyenv/pyenv
pyenv install 3.9.0
pyenv virtualenv 3.9.0 aiohttp-3.9.0
pyenv local aiohttp-3.9.0
Works like a charm.
P.S. Need to initialize PATH and PYENV_ROOT in bashrc/zshrc first as described in pyenv installation section first.
For CentOS I've got the leak :-(
Line # Mem usage Increment Occurences Line Contents
============================================================
22 28.3 MiB 28.3 MiB 1 @profile
23 async def quicktest():
24 28.3 MiB 0.0 MiB 1 url = "http://localhost:8000/A17_FlightPlan.pdf"
25 28.3 MiB 0.0 MiB 1 session = aiohttp.ClientSession()
26 28.3 MiB 0.0 MiB 8 jobs = [grab_data(url, session) for _ in range(5)]
27 128.0 MiB 99.7 MiB 2 jobs_result = await asyncio.gather(*jobs)
28 128.0 MiB 0.0 MiB 1 if not all(jobs_result): # sanity check
29 _log.info("Something didn't give back data")
30 128.0 MiB 0.0 MiB 2 await session.close()
[2020-11-13 12:52:31,871] [ 4833_test.py:40 ] [ INFO] Completed example
Filename: ./4833_test.py
Line # Mem usage Increment Occurences Line Contents
============================================================
33 28.1 MiB 28.1 MiB 1 @profile
34 def main():
35 28.3 MiB 0.1 MiB 1 logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
36 28.3 MiB 0.0 MiB 1 loop = asyncio.get_event_loop()
37 28.3 MiB 0.0 MiB 1 try:
38 68.7 MiB 40.4 MiB 1 loop.run_until_complete(quicktest())
39 # memory usage should be back to normal
40 68.7 MiB 0.0 MiB 1 _log.info("Completed example")
41 except KeyboardInterrupt:
42 _log.info("shutting down...")
43 loop.stop()
Dockerfile:
FROM centos
RUN yum install -y python3
RUN yum groupinstall -y 'Development Tools'
RUN yum install -y python3-devel
RUN pip3 install memory_profiler
RUN yum install -y wget
RUN wget https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf
RUN pip3 install aiohttp==3.7.2
ADD 4833_test.py /
$ python3 --version
Python 3.6.8
Oooh. Thanks. At least I have comprehensive instruction for "how to reproduce".
Having a memory leak in one of my aiohttp apps, I've been following this closely. If memory_profiler is to be trusted, this still happens with aiohttp 3.7.x and cpython 3.9.4.
I've tried the following with no change to the apparent leak reported by memory_profiler:
- Manually calling
gc.collect()
before or after stopping the event loop.- Also adding an asyncio.sleep() before or after the collect.
- Encapsulating the behavior in
asyncio.run()
. - Avoiding context managers and manually calling
__aenter__
and__aexit__
on the request context. - Using non-SSL/TLS connections
- Using
AsyncResolver
instead of the loop/OS/threadpool-based one. - Avoiding DNS entirely and just hitting 127.0.0.1.
- Managing the
Tasks
directly instead of spawning withgather
. - Using sequenced
awaits
instead of using concurrent tasks. - Using the
AIOHTTP_NO_EXTENSIONS
environment variable.
There are dependencies between a HttpResponseParser
, Connection
, ClientResponse
, ResponseHandler
, and Connector
objects, some of them are circular. However, as of Python 3.4, the garbage collector should be able to collect them all as long as nothing refers to one of them after running all of their finalizers (like __del__(…)
).
I experience the same problem with aiohttp 3.8.1 and python 3.7.1
Here's some bells and whistles to make spotting the issue easier:
https://github.com/gregdan3/aiohttp-memleak