fastapi The memory usage piles up over the time and leads to OOM

The memory usage piles up over the time and leads to OOM

Open prav2019 opened this issue 4 years ago • 75 comments

First check

[x] I added a very descriptive title to this issue.
[x] I used the GitHub search to find a similar issue and didn't find it.
[x] I searched the FastAPI documentation, with the integrated search.
[x] I already searched in Google "How to X in FastAPI" and didn't find any information.
[x] I already read and followed all the tutorial in the docs and didn't find an answer.
[ ] I already checked if it is not related to FastAPI but to Pydantic.
[ ] I already checked if it is not related to FastAPI but to Swagger UI.
[ ] I already checked if it is not related to FastAPI but to ReDoc.
[x] After submitting this, I commit to one of:
- Read open issues with questions until I find 2 issues where I can help someone and add a comment to help there.
- I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
- Implement a Pull Request for a confirmed bug.

Example

Here's a self-contained, minimal, reproducible, example with my use case:

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}

Description

Open the browser and call the endpoint /.
It returns a JSON with {"Hello": "World"}.
But I expected it to return {"Hello": "Sara"}.

Environment

OS: [e.g. Linux / Windows / macOS]:
FastAPI Version [e.g. 0.3.0]:

To know the FastAPI version use:

python -c "import fastapi; print(fastapi.__version__)"

Python version:

To know the Python version use:

python --version

Additional context

Tracemalloc gave insight on the lines , that are top consumers of memory: (top one seems to be below line in uvicorn) /usr/local/lib/python3.6/site-packages/uvicorn/main.py:305: Line: loop.run_until_complete(self.serve(sockets=sockets))

Jun 25 '20 14:06 prav2019

same issue i faced @prav2019

Any solution to overcome that @Riki-1mg

Jun 25 '20 18:06 prav2019

No @prav2019, are you using aiohttp as http clients in your service ? https://github.com/tiangolo/fastapi/issues/1623

Jun 25 '20 22:06 Riki-1mg

No @Riki-1mg

Jun 26 '20 01:06 prav2019

@app.get("/")
def read_root():
  return {"Hello": "World"}

Surely the expected behaviour here is to return {"Hello": "World"}?

If you want this function to return {"Hello": "Sara"} you'd probably need to do something like:

@app.get("/")
def read_root():
  return {"Hello": "Sara"}

Further, I can't reproduce your error on my "machine" (it's sitting on a cloud somewhere). You can see the full details here but everything looks to be working fine.

I suspect that this is specific to your operating system setup, etc. Would you please provide some more info needed/useful to know in context of how to reproduce the error?

How much RAM does your machine have?
How much memory is each function using (ideally include all the debugging output). This wil look something like the below (example from the Python documentation)

[ Top 10 ]
<frozen importlib._bootstrap>:716: size=4855 KiB, count=39328, average=126 B
<frozen importlib._bootstrap>:284: size=521 KiB, count=3199, average=167 B
/usr/lib/python3.4/collections/__init__.py:368: size=244 KiB, count=2315, average=108 B
/usr/lib/python3.4/unittest/case.py:381: size=185 KiB, count=779, average=243 B
/usr/lib/python3.4/unittest/case.py:402: size=154 KiB, count=378, average=416 B
/usr/lib/python3.4/abc.py:133: size=88.7 KiB, count=347, average=262 B
<frozen importlib._bootstrap>:1446: size=70.4 KiB, count=911, average=79 B
<frozen importlib._bootstrap>:1454: size=52.0 KiB, count=25, average=2131 B
<string>:5: size=49.7 KiB, count=148, average=344 B
/usr/lib/python3.4/sysconfig.py:411: size=48.0 KiB, count=1, average=48.0 KiB

Does your out-of-memory error include a traceback? Please include that if that's the case.

Jun 26 '20 08:06 teymour-aldridge

@teymour-aldridge : here is the debug statistics: Memory Statistics Top 10 Files /usr/local/lib/python3.6/site-packages/uvicorn/main.py:305: size=1652 KiB (+1499 KiB), count=4597 (+4172), average=368 B KB

/usr/local/lib/python3.6/site-packages/starlette/applications.py:136: size=1288 KiB (+1173 KiB), count=2290 (+2086), average=576 B KB

/usr/local/lib/python3.6/threading.py:347: size=943 KiB (+854 KiB), count=1836 (+1657), average=526 B KB

/usr/local/lib/python3.6/queue.py:145: size=919 KiB (+835 KiB), count=1783 (+1619), average=528 B KB

/usr/local/lib/python3.6/asyncio/locks.py:233: size=885 KiB (+807 KiB), count=9633 (+8771), average=94 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:82: size=788 KiB (+717 KiB), count=6876 (+6264), average=117 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:77: size=751 KiB (+684 KiB), count=2289 (+2086), average=336 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:146: size=725 KiB (+662 KiB), count=15984 (+14611), average=46 B KB

/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/result.py:376: size=657 KiB (+590 KiB), count=10490 (+9426), average=64 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:285: size=609 KiB (+555 KiB), count=4589 (+4183), average=136 B KB

Scanned Lines that consumes more memory uvicorn/main.py

loop.run_until_complete(self.serve(sockets=sockets)) starlette/applications.py

scope["app"] = self python3.6/threading.py

waiters_to_notify = _deque(_islice(all_waiters, n)) python3.6/queue.py

self.not_empty.notify() asyncio/locks.py

self._waiters = collections.deque() http/httptools_impl.py

self.parser = httptools.HttpRequestParser(self) http/httptools_impl.py

self.config = config http/httptools_impl.py

self.parser.feed_data(data) engine/result.py

for obj_elem in elem[4] http/httptools_impl.py

self.timeout_keep_alive, self.timeout_keep_alive_handler

Jun 26 '20 20:06 prav2019

@teymour-aldridge the above grows and cause OOM after a while

Jun 26 '20 20:06 prav2019

@prav2019 I can't reproduce the bug on either my machine or on a cloud-hosted linux container; this leads me to believe that the problem is in the way that your machine/environment setup.

In the issue template, it asks for the following fields – would you mind filling them in?

OS: [e.g. Linux / Windows / macOS]:
FastAPI Version [e.g. 0.3.0]:

Also, there's a "checklist" at the top of the issue which you should fill out!

Jun 26 '20 20:06 teymour-aldridge

@teymour-aldridge , this usually happens when there are some traffic over a period of time. Usually this is happening in our prod environment. OS: using docker image: python 3.6 [so debian ] **Linux FastAPI version: fastapi[all]==0.20.0

Jun 26 '20 23:06 prav2019

@teymour-aldridge , current status: Top 10 Files /usr/local/lib/python3.6/site-packages/uvicorn/main.py:305: size=3650 KiB (+1922 KiB), count=10158 (+5349), average=368 B KB

/usr/local/lib/python3.6/site-packages/starlette/applications.py:136: size=2851 KiB (+1504 KiB), count=5069 (+2673), average=576 B KB

/usr/local/lib/python3.6/threading.py:347: size=2104 KiB (+1110 KiB), count=4083 (+2155), average=528 B KB

/usr/local/lib/python3.6/queue.py:145: size=2049 KiB (+1087 KiB), count=3974 (+2109), average=528 B KB

/usr/local/lib/python3.6/asyncio/locks.py:233: size=1948 KiB (+1017 KiB), count=21295 (+11208), average=94 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:82: size=1744 KiB (+920 KiB), count=15226 (+8037), average=117 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:77: size=1663 KiB (+877 KiB), count=5068 (+2673), average=336 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:146: size=1598 KiB (+839 KiB), count=35181 (+18488), average=47 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:285: size=1349 KiB (+712 KiB), count=10163 (+5364), average=136 B KB

/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:216: size=1288 KiB (+676 KiB), count=30112 (+15818), average=44 B KB

Jun 26 '20 23:06 prav2019

0.20.0 is a somewhat old version of FastAPI, what happens if you use the latest release instead?

Jun 27 '20 06:06 teymour-aldridge

@teymour-aldridge haven't tried with latest version ? but i like to give it a shot, if there were any memory fix done with the versions after 0.20.0

Jun 27 '20 23:06 prav2019

@teymour-aldridge updated to latest version, will report back, once I see the result, Thanks

Jun 29 '20 17:06 prav2019

@teymour-aldridge I checked on our staging, after updating to latest versions [fast api and uvicorn], the memory issue still exists

Jun 30 '20 18:06 prav2019

@teymour-aldridge can this one work, i saw this in uvicorn documentation: --limit-max-requests - Maximum number of requests to service before terminating the process. Useful when running together with a process manager, for preventing memory leaks from impacting long-running processes.

Jun 30 '20 18:06 prav2019

@prav2019 I don't know, it might do. How many requests are you handling ~and how many machines do you have~?

Jun 30 '20 19:06 teymour-aldridge

@teymour-aldridge or trying to add gunicorn!!!

Jun 30 '20 21:06 prav2019

im running into the same issue - memory usage slowly builds over time, runnign on gunicorn with 4 uvicorn workers

Aug 06 '20 23:08 curtiscook

+1, seeing the same issue

fastapi==0.55.1
uvicorn==0.11.5
gunicorn==19.10.0

Gunicorn + uvicorn worker class

Aug 10 '20 18:08 erikreppel

Reading through the uvicorn code.. adding the max-requests effectively just restarts the server as soon as you hit some arbitrary number of requests?

https://github.com/encode/uvicorn/blob/e77e59612ecae4ac10f9be18f18c47432be7909a/uvicorn/main.py#L537-L539

I can't find any good documentation on what this number should be either 500? 1000? 10k? 100k?

If anyone has any experience/advice here, I'm all ears

Aug 10 '20 19:08 curtiscook

@curtiscook The max-requests restarts the service completely, we need to configure workers to keep one running always, when we restart another, was able to solve memory issue, but got into one more, now sometimes I get multiple requests to workers with same data and each worker creates new entry into database.

Aug 12 '20 02:08 prav2019

@prav2019 So what exactly solved your OOM issue, was it setting the max-requests?

Aug 19 '20 16:08 drisspg

Hi,

I actually have not solved my memory leak issue but it's small enough to not be a huge concern. I'm also seeing the memory leak in other async processes so it might be an issue with long running event loops in async python?

@curtiscook The max-requests restarts the service completely, we need to configure workers to keep one running always, when we restart another, was able to solve memory issue, but got into one more, now sometimes I get multiple requests to workers with same data and each worker creates new entry into database.

That's what I thought it might do. Not really a great solution then :(

Aug 19 '20 20:08 curtiscook

I would have thought it would be pretty tricky to have a memory leak in Python. Perhaps an underlying issue in the interpreter (C is very good for memory leaks :D) or a C extension module?

Anyway this has peaked my interest, so I'll try and investigate a little to see what's causing the problem.

Is this only happening in Docker containers, or can it be reproduced across a number of devices? I'm not experiencing this issue on my machine, having left a FastAPI process running for a few days (nothing unusual happened).

It's generally a good move to use ephemeral (short-lived) processes to run applications and regularly recycle them in order to reduce the risk/impact a memory leak (which tend to build up over time) can have.

Aug 19 '20 20:08 teymour-aldridge

I haven't tested outside of docker containers/ heroku docker containers

Aug 20 '20 18:08 curtiscook

Ah. I'll try and do some profiling. Unfortunately my time is pretty scarce these days with the number of different projects I'm working on but fingers crossed.

Aug 20 '20 19:08 teymour-aldridge

python 3.6
fastapi==0.60.1
uvicorn==0.11.3

uvicorn main:app --host 0.0.0.0 --port 8101 --workers 4
docker：2 core 2GB memory，CentOS Linux release 7.8.2003 (Core)

client call the function below per minute, and server memory usage slowly builds over time.

...
from fastapi import BackgroundTasks
...

@router.get('/tsp/crontab')
def tsp_crontab_schedule(topic: schemas.AgentPushParamsEnum,
                         background_tasks: BackgroundTasks,
                         api_key: str = Header(...)):
    crontab = CrontabMain()

    if topic == topic.schedule_per_minute:
        background_tasks.add_task(crontab.schedule_per_minute)

Aug 21 '20 03:08 binbinah

update:

I add async before def and it worked
refer： https://github.com/tiangolo/fastapi/issues/596#issuecomment-647704509

@router.get('/tsp/crontab')
def tsp_crontab_schedule(topic: schemas.AgentPushParamsEnum..)
    pass

@router.get('/tsp/crontab')
async def tsp_crontab_schedule(topic: schemas.AgentPushParamsEnum..)
    pass

Aug 21 '20 04:08 binbinah

https://github.com/tiangolo/fastapi/issues/1624#issuecomment-676517603 . The max request fixed OOM, as it restarts server, but it opened to lot of concurrency issues.

Nov 09 '20 23:11 prav2019

did replacing def with async def helped, because I tried long back it didn't!!!!!

Nov 10 '20 19:11 prav2019

update:

I add async before def and it worked refer： #596 (comment)

@router.get('/tsp/crontab')
def tsp_crontab_schedule(topic: schemas.AgentPushParamsEnum..)
    pass

@router.get('/tsp/crontab')
async def tsp_crontab_schedule(topic: schemas.AgentPushParamsEnum..)
    pass

either way, you shouldn't have expanding memory. running the route in def is supported and meant to process requests in a new threadpool?

did replacing def with async def helped, because I tried long back it didn't!!!!!

I've always been running async

Nov 17 '20 02:11 curtiscook

I am running this in docker with:

python==3.8.6
fastapi==0.61.1
uvicorn==0.11.8

and I am seeing the same issue for some time. If I understand this thread, it seems that the recommendation is to restart the uvicorn worker(s) periodically, and never run uvicorn alone without a process manager. I guess it also better to transfer attention to uvicorn? Or just not use uvicorn perhaps.

Nov 23 '20 08:11 AdolfVonKleist

Related to #596, that issue already contains a lot of workarounds and information, please follow the updates over there.

Nov 23 '20 08:11 ycd

Related to #596, that issue already contains a lot of workarounds and information, please follow the updates over there.

It looks like #596 is due to using def instead of async def ? Ive seen this with async def

Nov 24 '20 03:11 curtiscook

@curtiscook no my point was there are already a lot of workarounds, keeping them in one place would be better for future references, and the developers who face the same problem in the future, can search for a solution in one place that fits their case.

Even if most answers are about using async def instead of def. The title and the context are very clear actually Because the issue wasn't about using def instead of async def in our case. My colleague opened that issue more than one year ago and we're no longer facing memory problems, we're not using async def either.

Nov 24 '20 10:11 ycd

run into the same error def -> memory leak async def -> no memory leak thanks @curtiscook

Nov 27 '20 14:11 delijati

Same issue here. Did anybody find a good workaround in the meantime?

Mar 15 '21 12:03 lmssdd

What version of Python are you using @lmssdd ?

Mar 16 '21 20:03 ycd

What version of Python are you using @lmssdd ?

@ycd it's 3.8.5, running on a DigitalOcean droplet. it's on gunicorn with Uvicorn workers.

Mar 17 '21 08:03 lmssdd

run into the same error def -> memory leak async def -> no memory leak thanks @curtiscook

Confirming this worked for me. Oddly only had the proble on ec2 (deep learning AMI,CPU only). Running on my local machine worked fine

Apr 20 '21 20:04 talolard

Python Version: 3.8.9 FastAPI Version: 0.67.0 Environment: Linux 5.12.7 x86_64

We are able to consistently produce a memory leak by using a synchronous Depends:

from fastapi import FastAPI, Body, Depends
import typing
import requests

app = FastAPI()

def req() -> bool: 
    r = requests.get("https://google.com")
    return True

@app.post("/")
def root(payload: list = Body(...), got: bool = Depends(req)):
    return payload

This is resolved by switching both endpoint and depends to async def. This took us a while to hunt down. At first we also thought it only occurred on EC2, but that's because we were disabling our authentication routines for local testing, which is where the issue was located. For those struggling here: check your depends, if you've got them.

Jul 27 '21 11:07 imw

Python Version: 3.8.9 FastAPI Version: 0.67.0 Environment: Linux 5.12.7 x86_64

We are able to consistently produce a memory leak by using a synchronous Depends:
from fastapi import FastAPI, Body, Depends
import typing
import requests

app = FastAPI()

def req() -> bool: 
    r = requests.get("https://google.com")
    return True

@app.post("/")
def root(payload: list = Body(...), got: bool = Depends(req)):
    return payload
This is resolved by switching both endpoint and depends to async def. This took us a while to hunt down. At first we also thought it only occurred on EC2, but that's because we were disabling our authentication routines for local testing, which is where the issue was located. For those struggling here: check your depends, if you've got them.

I think this might be my issue as well (although I've been ignoring it for awhile. My endpoint is async, but my depends is sync.

Jul 29 '21 03:07 curtiscook

I'm also having ghastly memory issues.

my code is fully open source, so feel free to peek: https://github.com/daggy1234/dagpi-image

Aug 10 '21 13:08 daggy1234

run into the same error def -> memory leak async def -> no memory leak thanks @curtiscook

@delijati @curtiscook is the fix literally changing def fn_name to async def fn_name?

In our case, we obviously async functions whenever we are doing something asynchronously such as calling a REST or GraphQL endpoint, etc.

However, we also have a ton of methods where nothing async is happening. Do you suggest turning them into async as well without no await in them?

What about the class constructors (def __init__)?

Aug 11 '21 01:08 munjalpatel

@munjalpatel [1] there you see that every route that is not async will be executed in a separate thread ... the problem is that it used to use by default a thread pool and this uses up to min(32, os.cpu_count() + 4) workers [2] so i assume that on some python version this workers are not reused or released and you end up increasing memory. I wrote a litte test app demonstrate that. [3]

The implementation run_in_threadpool from [1] is coming from starlette 0.14.2 [4] (fastapi pinned to that version). but they changed there code to anyio [5]. I just looked briefly into the anyio code but to me it looks like an update to the new starlette ,anyio version could fix that memory problem. But :man_shrugging: if fastapi will update to the new starlette.

[1] https://github.com/tiangolo/fastapi/blob/master/fastapi/routing.py#L144 [2] https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor [3] https://github.com/tiangolo/fastapi/issues/596#issuecomment-734880855 [4] https://github.com/encode/starlette/blob/0.14.2/starlette/concurrency.py#L27 [5] https://github.com/encode/starlette/blob/master/starlette/concurrency.py#L27

Aug 12 '21 11:08 delijati

Thanks @delijati I'll do some experiments with this as well. We are using amd64/pypy:3.7 if that matters.

This is a serious issue for us where every FastAPI based pods eventually get OOMed and restarted.

Aug 12 '21 14:08 munjalpatel

Just verified that for us moving functions to async still doesn't app.

For example, this pod has almost all methods as async and we still see memory creeping.

Aug 12 '21 14:08 munjalpatel

I'm creating a ThreadPoolExecutor in the background job. after execution completion, the threads do get closed with all futures resolved, the process creating the threads just doesn't close and takes up memory Procfile -> gunicorn -w 1 -k uvicorn.workers.UvicornWorker src:app --log-level info deps: uvicorn==0.14.0 fastapi==0.66.0

Any suggestion would be helpful here.

Aug 15 '21 18:08 codejunction

I'm also having this issue and I've imputed to uvicorn workers. So I opened this issue: https://github.com/encode/uvicorn/issues/1226

Oct 21 '21 10:10 KiraPC

Yea it's getting pretty unusable for us too, due to this issue. It's pretty major and whilst we love Fast API, we're debating whether to switch to a different framework to fix this issue.

Nov 02 '21 13:11 maitham

FWIW -- you may have luck switching to hypercorn workers (plus you get http/2)

Nov 02 '21 17:11 curtiscook

We're also experiencing this issue using Starlette <=0.14.2. We've got a few services using the same setup of docker + Starlette and they all eventually go OOM and killed,

Nov 18 '21 08:11 sostholmwork

I am also getting memory issues. Tested out FastAPI + hypercorn but the issue remains.

Nov 18 '21 09:11 evaldask

I have implemented a hacky way to restart workers, but I don't think it is a good idea to restart services. Waiting for a solution, so that I can remove restart logic and not to care about this weird OOM issue.

Nov 19 '21 23:11 prav2019

In my case:

I tried multiple memory profiling tools but they didn't work well with such complex application like FastAPI
tracemalloc() did partly work but the it didn't report the memory correctly. I was able to track down another memory leaking due to not closing tempfile correctly. However, this time it's different. I guess it had trouble working with asyncio and ThreadPoolExecutor.

Finally, I was able to track down the memory leak by simply commenting block by block in my code. Turns out, it was due to this snippet at the top of main.py:

from starlette.exceptions import HTTPException as StarletteHTTPException
from fastapi.responses import PlainTextResponse
@app.exception_handler(StarletteHTTPException)
async def http_exception_handler(request, exc):
    return PlainTextResponse(str(exc.detail), status_code=exc.status_code)

By converting that async function to normal function. The memory stopped leaking (I used locust to spawn 4000 users uploading an audio file of 20 seconds):

from starlette.exceptions import HTTPException as StarletteHTTPException
from fastapi.responses import PlainTextResponse
@app.exception_handler(StarletteHTTPException)
def http_exception_handler(request, exc):
    return PlainTextResponse(str(exc.detail), status_code=exc.status_code)

Dec 20 '21 10:12 lamnguyenx

@lamnguyenx By converting async http_exception_handler function to normal function, the issue remains.

Dec 30 '21 05:12 yangxuhui

@lamnguyenx By converting async http_exception_handler function to normal function, the issue remains.

Could you try removing the custom http exception handling for now, and then re-run the load testing?

Anyway, I did see that the graph looks a bit less steep (the orange region) after you applied the workaround. Maybe some other things caused the leak too?

Dec 30 '21 05:12 lamnguyenx

I did encounter this issue last week, the root cause looked to be mismatched Pydantic types. For instance we had a int defined in a response model but was actually a float when returned from our database. We also had an int that was also a str. Cleaning up the types solved the issue for use. This was a high traffic endpoint of < 100rps

I'm not sure the root cause but I suspect that the errors are caught in and recored in Pydantic somewhere, I suspect here as FastAPI validates returned responses here.

Jan 24 '22 10:01 samjacobclift

I'm wondering if the run_in_threadpool(field.validate, ...) call here in combination with the global EXC_TYPE_CACHE used by Pydantic's validation could be contributing.

Feb 10 '22 11:02 jacksmith15

Any updates on this?

I'm using:

python = 3.8.2 uvicorn = 0.14.0 fastapi = 0.65.2

I changed all my routes to async and the issue persist. My app starts at 300MB and quickly jumps to 1400 after a couple of requests and never go back

Feb 23 '22 20:02 robertop23

I have same issue. pod only have 2G RAM, so the pod restart over and over again.......

Mar 02 '22 08:03 Astral1020

Is there any valid workarounds for this? And is this true my a fastapi issue or a uvicorn issue? There are similar issues in both projects but it seems like there is no one concrete answer.

Mar 19 '22 17:03 amyruski

I am also facing a similar issue. All my api endpoints are defined with async.

One thing I have observed though. When I comment out my background tasks (which mainly consist of Database update queries) a consistent increase in RAM was not observed even after load testing with Locust at 300RPS.

If it helps the database I am using is Postgres.

Versions : Python 3.8.10 Fastapi 0.63.0 Uvicorn 0.13.4

Mar 21 '22 11:03 nikhilkharade

Hello guys I and my colleagues had a similar issue and we solved it.

After profiling we found out the coroutines created by uvicorn did not disappear but remain in the memory (health check request, which basically does nothing could increase the memory usage). This phenomenon was only observed in the microservices that were using tiangolo/uvicorn-gunicorn-fastapi:python3.9-slim-2021-10-02 as base image. After changing the Image, the memory did not increased anymore.

if you are using tiangolo/uvicorn-gunicorn-fastapi as base docker image, try building from python official image. [it worked for us]
if it doesn't work, profile your own reason. the script below may help you.

# [Memory Leak Profiler]
# REF: https://tech.gadventures.com/hunting-for-memory-leaks-in-asyncio-applications-3614182efaf7

def format_frame(f):
    keys = ["f_code", "f_lineno"]
    return OrderedDict([(k, str(getattr(f, k))) for k in keys])

def show_coro(c):
    data = OrderedDict(
        [
            ("txt", str(c)),
            ("type", str(type(c))),
            ("done", c.done()),
            ("cancelled", False),
            ("stack", None),
            ("exception", None),
        ]
    )
    if not c.done():
        data["stack"] = [format_frame(x) for x in c.get_stack()]
    else:
        if c.cancelled():
            data["cancelled"] = True
        else:
            data["exception"] = str(c.exception())
    return data

async def trace_top20_mallocs(sleep_time = 300):
    """
    See https://docs.python.org/ko/3/library/tracemalloc.html
    """
    # has_snap_shot_before = False

    initial_snapshot = (
        tracemalloc.take_snapshot()
    )  # copy.deepcopy(tracemalloc.take_snapshot())
    while True:
        if tracemalloc.is_tracing():
            snapshot = tracemalloc.take_snapshot()
            top_stats = snapshot.compare_to(
                initial_snapshot, "lineno"
            )  # snapshot.statistics("lineno")
            print(f"[ TOP 20 ] diff {datetime.now()}")
            traces = [str(x) for x in top_stats[:20]]
            for t in traces:
                print(t)
            await asyncio.sleep(sleep_time)


async def show_all_unfinished_coroutine_status(sleep_time=200):
    cnt = 0
    while True:
        await asyncio.sleep(sleep_time)
        tasks = asyncio.all_tasks()
        if len(tasks) != cnt:

            for task in tasks:
                formatted = show_coro(task)
                print(json.dumps(formatted, indent=2))
            cnt = len(tasks)
        print(len(tasks))


loop = asyncio.get_running_loop()
asyncio.ensure_future(trace_top20_mallocs(), loop=loop)
asyncio.ensure_future(show_all_unfinished_coroutine_status(), loop=loop)

Apr 05 '22 04:04 Apiens

If you're in a hurry and need a quick and temporary solution for now.

--max requests 1 --workers 10

This helped me. You can get 10 simultaneous request where each worker will be restarted when request is finished. Thus the memory will be released.

Apr 06 '22 09:04 ConMan05

Hello guys I and my colleagues had a similar issue and we solved it.

if you are using tiangolo/uvicorn-gunicorn-fastapi as base docker image, try building from python official image. [it worked for us]
if it doesn't work, profile your own reason. the script below may help you.

# [Memory Leak Profiler]
# REF: https://tech.gadventures.com/hunting-for-memory-leaks-in-asyncio-applications-3614182efaf7

def format_frame(f):
    keys = ["f_code", "f_lineno"]
    return OrderedDict([(k, str(getattr(f, k))) for k in keys])

def show_coro(c):
    data = OrderedDict(
        [
            ("txt", str(c)),
            ("type", str(type(c))),
            ("done", c.done()),
            ("cancelled", False),
            ("stack", None),
            ("exception", None),
        ]
    )
    if not c.done():
        data["stack"] = [format_frame(x) for x in c.get_stack()]
    else:
        if c.cancelled():
            data["cancelled"] = True
        else:
            data["exception"] = str(c.exception())
    return data

async def trace_top20_mallocs(sleep_time = 300):
    """
    See https://docs.python.org/ko/3/library/tracemalloc.html
    """
    # has_snap_shot_before = False

    initial_snapshot = (
        tracemalloc.take_snapshot()
    )  # copy.deepcopy(tracemalloc.take_snapshot())
    while True:
        if tracemalloc.is_tracing():
            snapshot = tracemalloc.take_snapshot()
            top_stats = snapshot.compare_to(
                initial_snapshot, "lineno"
            )  # snapshot.statistics("lineno")
            print(f"[ TOP 20 ] diff {datetime.now()}")
            traces = [str(x) for x in top_stats[:20]]
            for t in traces:
                print(t)
            await asyncio.sleep(sleep_time)


async def show_all_unfinished_coroutine_status(sleep_time=200):
    cnt = 0
    while True:
        await asyncio.sleep(sleep_time)
        tasks = asyncio.all_tasks()
        if len(tasks) != cnt:

            for task in tasks:
                formatted = show_coro(task)
                print(json.dumps(formatted, indent=2))
            cnt = len(tasks)
        print(len(tasks))


loop = asyncio.get_running_loop()
asyncio.ensure_future(trace_top20_mallocs(), loop=loop)
asyncio.ensure_future(show_all_unfinished_coroutine_status(), loop=loop)

This is great. I'm 90% sure that the issue you found is the one that I was experiencing. As one of the early posters on this issue, I haven't noticed this issue anymore--but there were a few things that have happened since... Namely:

I've been upgrading FastAPI
I moved from uvicorn to hypercorn since hypercorn supports HTTP/2

It's possible that I still have a memory leak, but it's not as detrimental as 2 years ago.

Re:

--max requests 1 --workers 10. This helped me. You can get 10 simultaneous request where each worker will be restarted when request is finished. Thus the memory will be released.

I don't think this is a viable solution since I believe this blocks the event loop and relies on multiprocessing, which skips out on one of the major benefits of the ASGI server (not getting [thread/process]bound with a single worker)

I don't know if @tiangolo has any thoughts? I feel like we're finally closer to being able to close this issue

Apr 06 '22 14:04 curtiscook

I'm having a massive leak with tensorflow + inference with dockerized fastapi + uvicorn server. anyone met that? (I'm on a machine with 120GB RAM)

Apr 13 '22 06:04 Xcompanygames

While it didn't completely solve memory pile up over time, using gunicorn_conf.py attached below made the increase minimal over time.

import os

host = os.getenv("HOST", "0.0.0.0")
port = os.getenv("PORT", "8000")

# Gunicorn config variables
loglevel = os.getenv("LOGLEVEL", "error")
workers = int(os.getenv("WORKERS", "2"))
bind = f"{host}:{port}"
errorlog = "-"
logconfig = "/logging.conf"

Apr 20 '22 07:04 evaldask

@Xcompanygames Consider Using ONNX instead of TF as it's usually faster and more reliable.

I'm having a memory leak, but i think is because the inference data stays on memory / gets dupped at some point. I'll update later if the issue is not related with the inference process.

Update: I wasn't closing correctly the onnx inference session. The memory accumulation is almost unnoticeable now!

May 08 '22 18:05 JorgeRuizDev

Can confirm I am experiencing the same issue. Using Python 3.10 + FastAPI + Hypercorn[uvloop] with 2 workers. The FastAPI project is brand new, so there isn't any tech debt that could possibly be the cause - no models, schemas or anything fancy being done here.

[tool.poetry.dependencies]
python = "^3.9"
celery = "^5.2.3"
neo4j = "^4.4.3"
hypercorn = {extras = ["uvloop"], version = "^0.13.2"}
fastapi = "^0.77.1"

The Docker container starts at around 105.8 MiB of RAM usage when fresh.

After running a Locust swarm (40 Users) all hitting an endpoint that returns data ranging from 200KB to 8MB - the RAM usage of the Docker container grows (and sometimes shrinks, but mostly grows) until I get an OOM exception. The endpoint retrieves data from the Neo4J database and closes the driver connection cleanly each time.

I had some success making the function async def even though there was nothing to await on. But it seems that FastAPI is still holding onto some memory somewhere... caching?

I'm curious why this topic isn't more popular; surely everyone would be experiencing this. Perhaps we all notice it due to our endpoints returning enough data for us to notice the increase in usage, whereas the general user would most times only return a few KB at a time.

Additional details: Docker CMD

CMD ["hypercorn", "app.main:app", "--bind", "0.0.0.0:8001", "--worker-class", "uvloop", "--workers", "2"]

May 11 '22 03:05 Bears-Eat-Beets

I've also experienced this, I don't really understand what causes it.

May 11 '22 18:05 Atheuz

@tiangolo Is there any updates on this? Thank you.

Jun 08 '22 09:06 SarloAkrobata

I'm also noticing that memory keeps on increasing when hitting my service with 20 virtual users as part of a performance test locally. Am using python3.7 + gunicorn + fastapi + uvicorn inside a docker container.

Jun 09 '22 08:06 JP-Globality

What is the deal with the original issue of not returning {"Hello": "Sara"}? Was the original issue edited and now doesn't make sense in this regard?

EDIT: oh, okay. I see. The author included the bug report template without editing it. Sorry for the noise.

Jul 18 '22 15:07 fredrikaverpil

Many folks are affected by this issue so definitely something is happening, but it could as well be that the problem is in the user code and not in fastapi. So I suggest that, to make things easier for the maintainers, if you're affected by this issue

Try to give as many details as possible. How is your setup? What versions/Docker images are you using? What Python version, operating system, gunicorn version, etc?
Detail in what timeframe does your issue appear - weeks, days, hours? Does making more requests to the server accelerate the issue? Does the issue still appear if zero requests are made?
Have a look at memray to better understand your program or use @Apiens method above. Share your results in Gist or similar platforms.

Memory issues are tricky but without a good reproducer, it will be impossible for the maintainers to declare whether this is still a problem or not, and if it is, to fix it.

Jul 18 '22 18:07 astrojuanlu

If this can help with debugging, using pyenv was able to reproduce the memory leak under python 3.8.X using multiple middlewares.

2243 memory blocks: 3820.5 KiB
  File "/home/lujeni/.pyenv/versions/3.8.13/lib/python3.8/asyncio/locks.py", line 257
    self._waiters = collections.deque()
8563 memory blocks: 3725.5 KiB
  File "/home/lujeni/.pyenv/versions/3.8.13/lib/python3.8/asyncio/events.py", line 81
    self._context.run(self._callback, *self._args)
6773 memory blocks: 3470.5 KiB
  File "/home/lujeni/.pyenv/versions/3.8.13/lib/python3.8/asyncio/base_events.py", line 1859
    handle._run()
22091 memory blocks: 2841.9 KiB
  File "/home/lujeni/.pyenv/versions/uep-moulinex-3.8.13/lib/python3.8/site-packages/starlette/middleware/base.py", line 30
    await self.app(scope, request.receive, send_stream.send)
20978 memory blocks: 2390.5 KiB
  File "/home/lujeni/.pyenv/versions/3.8.13/lib/python3.8/asyncio/base_events.py", line 431
    task = tasks.Task(coro, loop=self, name=name)
4381 memory blocks: 2375.4 KiB
  File "/home/lujeni/.pyenv/versions/3.8.13/lib/python3.8/asyncio/base_events.py", line 570
    self._run_once()
[...]

Via python 3.9.x or 3.10.x there is no issue

from fastapi import FastAPI, Request
import time

app = FastAPI()


@app.middleware("http")
async def middle_1(request: Request, call_next):
    return await call_next(request)


@app.middleware("http")
async def middle_2(request: Request, call_next):
    return await call_next(request)


@app.middleware("http")
async def middle_3(request: Request, call_next):
    return await call_next(request)

@app.middleware("http")
async def middle_4(request: Request, call_next):
    return await call_next(request)

@app.middleware("http")
async def middle_5(request: Request, call_next):
    return await call_next(request)


@app.get("/")
def read_root():
    return {"Hello": "World"}

Jul 20 '22 16:07 Lujeni

same issue

Sep 02 '22 16:09 tranvannhat

fastapi fastapi copied to clipboard

The memory usage piles up over the time and leads to OOM

First check

Example

Description

Environment

Additional context

fastapi
fastapi copied to clipboard