Child process died
Initial Checks
- [X] I confirm this was discussed, and the maintainers suggest I open an issue.
- [X] I'm aware that if I created this issue without a discussion, it may be closed without a response.
Discussion Link
Command:
uvicorn.run(app='api_server:app', host="0.0.0.0", port=8002, workers=2)
Log:
/ai/miniconda3/bin/conda run -n py311_catvton --no-capture-output python /ai/brx/CatVTON-edited/api_server.py
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
An error occurred while trying to fetch booksforcharlie/stable-diffusion-inpainting: booksforcharlie/stable-diffusion-inpainting does not appear to have a file named diffusion_pytorch_model.safetensors.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
INFO: Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit)
INFO: Started parent process [15707]
INFO: Waiting for child process [15885]
INFO: Child process [15885] died
Fetching 10 files: 100%|βββββββββββββββββββββ| 10/10 [00:00<00:00, 46345.90it/s]
Fetching 10 files: 100%|βββββββββββββββββββββ| 10/10 [00:00<00:00, 91779.08it/s]
Fetching 10 files: 100%|βββββββββββββββββββββ| 10/10 [00:00<00:00, 90394.48it/s]
INFO: Waiting for child process [15886]
INFO: Child process [15886] died
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
Fetching 10 files: 100%|βββββββββββββββββββββ| 10/10 [00:00<00:00, 22465.47it/s]
INFO: Waiting for child process [15887]
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
INFO: Child process [15887] died
Description
Child process died
Example Code
No response
Python, Uvicorn & OS Version
Running uvicorn 0.30.0 with CPython 3.11.10 on Linux
same here.
I encountered issues running my FastAPI project using different versions of Uvicorn within a Docker container. Details are as follows:
-
Environment
- FastAPI version:
0.115.6 - Python version:
3.12-alpine - Running in Docker
- FastAPI version:
-
Issue with
uvicorn <= 0.29.0
When I use Uvicorn versions <=0.29.0, the process gets permanently stuck at the following log message with no more output:
2024/12/16 11:44:36.354 INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024/12/16 11:44:36.354 INFO: [uvicorn.error] Started parent process [1]
- Issue with
uvicorn >= 0.30.0(including the latest version0.34.0)
With newer versions of Uvicorn, the process starts, but the child processes keep dying repeatedly:
2024/12/16 12:20:06.593 INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024/12/16 12:20:06.594 INFO: [uvicorn.error] Started parent process [1]
2024/12/16 12:20:07.622 INFO: [uvicorn.error] Waiting for child process [18]
2024/12/16 12:20:07.623 INFO: [uvicorn.error] Child process [18] died
2024/12/16 12:20:07.623 INFO: [uvicorn.error] Waiting for child process [19]
2024/12/16 12:20:07.624 INFO: [uvicorn.error] Child process [19] died
2024/12/16 12:20:08.127 INFO: [uvicorn.error] Waiting for child process [19]
2024/12/16 12:20:08.129 INFO: [uvicorn.error] Child process [19] died
2024/12/16 12:20:13.136 INFO: [uvicorn.error] Waiting for child process [23]
2024/12/16 12:20:13.137 INFO: [uvicorn.error] Child process [23] died
... (repeats indefinitely) ...
- Docker Setup
It's myDockerfile:
FROM python:3.12-alpine
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories \
&& apk add --update caddy gcc musl-dev libffi-dev
WORKDIR /app
COPY Backend/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt -i https://mirrors.aliyun.com/pypi/simple
COPY Backend ./backend
EXPOSE 8000
COPY startup.sh /app/startup.sh
RUN chmod +x /app/startup.sh
CMD ["/app/startup.sh"]
My startup.sh script:
#!/bin/sh
set -e
cd /app/backend
exec uvicorn main:get_app \
--host 0.0.0.0 \
--port 8000 \
--workers 2 \
--proxy-headers \
--forwarded-allow-ips '*' \
--factory \
--log-config logging_config.yaml
- Local Machine Behavior
When I run the same command directly on my Windows11 PC, The program can run normally:
uvicorn main:get_app --host 0.0.0.0 --port 8000 --workers 2 --proxy-headers --factory --log-config logging_config.yaml
When I start my service by fastapi==0.115.0 and uvicorn[standard]==0.34.0, workers set up as 2, the children process died and there is an error shown below:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\python_projects\ui_maker.venv\Lib\site-packages\starlette\routing.py", line 700, in lifespan await receive() File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\lifespan\on.py", line 137, in receive return await self.receive_queue.get() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\queues.py", line 158, in get await getter asyncio.exceptions.CancelledError
Process SpawnProcess-1: Traceback (most recent call last): File "D:\python311\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "D:\python311\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn_subprocess.py", line 80, in subprocess_started target(sockets=sockets) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\supervisors\multiprocess.py", line 63, in target return self.real_target(sockets) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 66, in run return asyncio.run(self.serve(sockets=sockets)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\base_events.py", line 654, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 70, in serve await self._serve(sockets) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 85, in _serve await self.startup(sockets=sockets) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 135, in startup server = await loop.create_server(create_protocol, sock=sock, ssl=config.ssl, backlog=config.backlog) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\base_events.py", line 1561, in create_server server._start_serving() File "D:\python311\Lib\asyncio\base_events.py", line 316, in _start_serving sock.listen(self._backlog) OSError: [WinError 10022] ζδΎδΊδΈδΈͺζ ζηεζ°γ
how to resolve this problem? my system is win10
I've also experienced a similar behavior, but in my case, the child process doesn't die on startup. When a request is received, it fetches a lot of data from a database to process it. As soon as the machine RAM exceeds 4-5GB, the child process dies. Now my machine's RAM is 64GB and when I was testing, there was no other request to cause RAM congestion.
However, I noticed that if I run it in a single worker mode, it does not happen.
Here's the issue link I posted https://stackoverflow.com/questions/79311202/fastapi-child-process-gets-killed-even-with-enough-unused-ram-and-cpu-left-in-th?noredirect=1#comment139857620_79311202
I updated from version 0.27.0 to 0.34.0 and started experimenting this same issues with FastAPI
I assume the ping pong protocol to figure out if child is alive is probably buggy and is killing healthy child processes.
threading.Thread is used to always pong, but that won't exactly work due to GIL. So if you're starting a big process that has a lot of imports, that will not allow this ponging loop to run.
Or, if you're loading/parsing a lot of data into memory (and not using multiprocessing but threading or just normal asyncio) it's possible this ponging loop won't run.
Hello, I also have been impacted by this issue. My API interface is Flask API. but I created the convert to be handled as an asgi interface.
when I set the workers to more than 1, it would be killed automatically
I have received so many log like this.
INFO: Waiting for child process [23]
INFO: Child process [23] died
But it's fine when I set the worker only 1.
Host Version
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
Uvicorn Version
uvicorn --version
Running uvicorn 0.34.0 with CPython 3.10.12 on Linux
It means that the polling thread did not start when the process was spawned. Uvicorn main process kills the child if it does not pong to its ping. That's the log you're seeing. Child did not die, it just did not yet start the thread that is always ponging.
It might be the case that it takes more than 5 seconds for your child to spawn. If instead of spawn, fork is used, this would be a non-issue in cases where your Python imports are slow.
same error even on latest version
same here.
I encountered issues running my FastAPI project using different versions of Uvicorn within a Docker container. Details are as follows:
Environment
- FastAPI version:
0.115.6- Python version:
3.12-alpine- Running in Docker
Issue with
uvicorn <= 0.29.0When I use Uvicorn versions <=0.29.0, the process gets permanently stuck at the following log message with no more output:2024/12/16 11:44:36.354 INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) 2024/12/16 11:44:36.354 INFO: [uvicorn.error] Started parent process [1]
- Issue with
uvicorn >= 0.30.0(including the latest version0.34.0) With newer versions of Uvicorn, the process starts, but the child processes keep dying repeatedly:2024/12/16 12:20:06.593 INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) 2024/12/16 12:20:06.594 INFO: [uvicorn.error] Started parent process [1] 2024/12/16 12:20:07.622 INFO: [uvicorn.error] Waiting for child process [18] 2024/12/16 12:20:07.623 INFO: [uvicorn.error] Child process [18] died 2024/12/16 12:20:07.623 INFO: [uvicorn.error] Waiting for child process [19] 2024/12/16 12:20:07.624 INFO: [uvicorn.error] Child process [19] died 2024/12/16 12:20:08.127 INFO: [uvicorn.error] Waiting for child process [19] 2024/12/16 12:20:08.129 INFO: [uvicorn.error] Child process [19] died 2024/12/16 12:20:13.136 INFO: [uvicorn.error] Waiting for child process [23] 2024/12/16 12:20:13.137 INFO: [uvicorn.error] Child process [23] died ... (repeats indefinitely) ...
- Docker Setup It's my
Dockerfile:FROM python:3.12-alpine
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories \
&& apk add --update caddy gcc musl-dev libffi-devWORKDIR /app
COPY Backend/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt -i https://mirrors.aliyun.com/pypi/simpleCOPY Backend ./backend
EXPOSE 8000
COPY startup.sh /app/startup.sh
RUN chmod +x /app/startup.sh
CMD ["/app/startup.sh"]
Mystartup.shscript:#!/bin/sh
set -ecd /app/backend
exec uvicorn main:get_app \
--host 0.0.0.0 \
--port 8000 \
--workers 2 \
--proxy-headers \
--forwarded-allow-ips '*' \
--factory \
--log-config logging_config.yaml
5. Local Machine Behavior When I run the same command directly on my Windows11 PC, The program can run normally:uvicorn main:get_app --host 0.0.0.0 --port 8000 --workers 2 --proxy-headers --factory --log-config logging_config.yaml
Update: I downgrade the Docker image python:3.12-alpine3.21 to python:3.12-alpine3.20, everything works as expected.
I encountered the exact same issue with the official docker image python:3.12-slim (with Debian bookworm).
When running as a docker container, it crashed on a proxmox VM with limited resources, but worked on my laptop. This only occured when workers > 1 and calling an endpoint which executed a heavy CPU-bound task (top command shew 100% CPU but 5% RAM) With only one worker it did the job, but only after 2 minutes on the VM whereas it took 20s on my laptop.
I also hit this issue, and it appears to be due the "spawn"ed worker process taking more than 5s to import the target module before its able to respond to pings. I optimized my imports to be lazy and the issue went away...
5 seconds doesn't seem to be enough, nor configurable. https://github.com/encode/uvicorn/blob/66b9b58ad90112d54e7e3c4160c837ec72de51da/uvicorn/supervisors/multiprocess.py#L37-L42
After some debugging, sharing my findings: https://github.com/encode/uvicorn/releases/tag/0.30.0
In the version 0.30.0 multi-process manager was implemented, which uses spawn by default instead of fork even for Linux systems. spawn is slower than fork so the default timeout of 5s stopped being enough in some instances. The is_alive check starts here
https://github.com/encode/uvicorn/blob/66b9b58ad90112d54e7e3c4160c837ec72de51da/uvicorn/supervisors/multiprocess.py#L65-L69
Currently there is no way to pass the timeout value as a parameter. Here's a PR trying to do just that: https://github.com/encode/uvicorn/pull/2397
If you want to test out if higher timeout fixes your issue here is a quick monkey patch:
import uvicorn
original_uvicorn_is_alive = uvicorn.supervisors.multiprocess.Process.is_alive
def patched_is_alive(self: Any) -> bool:
timeout = 20
return original_uvicorn_is_alive(self, timeout)
uvicorn.supervisors.multiprocess.Process.is_alive = patched_is_alive
For reference, since I lost a week investigating this issue: the same thing happens for me when using FastAPI with long-running BackgroundTasks inside a python:3.12-bookworm based Docker image and --workers 2. Downgrading to uvicorn==0.29.0 fixes the issue.
Hi I am also facing the same issue. Does anyone know if I do worker management through gunicorn then will this issue be resolved or it will still be there ? @guillp Have you faced issue after downgrade to version 0.29.0 in this span of one month duration ?
@bharatmshrtr after skimming source code from gunicorn, they do not use process pingpong loop and won't kill your workers if it takes them a few seconds to start up. of course, they might also use more efficient fork (where children share parent memory, so interpreter is already up and all python modules are loaded) so you won't have that problem anyway, I did not check.
What is the status with resolving this issue? My experience with airflow 3.0.6 environment is that the system is very unpredictable with uvicorn 3.x because of the child died issue. On a positive note, the following monkey patch fixes the problem following the suggestion above at https://github.com/Kludex/uvicorn/issues/2506#issuecomment-2894004072.
import uvicorn
try:
original_is_alive = uvicorn.supervisors.multiprocess.Process.is_alive
def patched_is_alive(self, *args, **kwargs):
# Force a default timeout if none is provided
timeout = kwargs.get("timeout", 30)
return original_is_alive(self, timeout=timeout)
uvicorn.supervisors.multiprocess.Process.is_alive = patched_is_alive
print("[sitecustomize] Patched uvicorn.supervisors.multiprocess.Process.is_alive with timeout=30")
except Exception as e:
# Don't break Python startup if uvicorn internals change
print(f"[sitecustomize] Skipped uvicorn patch: {e}")
It is obvious that the default 5s timeout is problematic. I changed it in the patch to 30 as shown above and my system works stably and fast. Is there a potential problem for uvicorn to at least allow a change of timeout value?
- I've merged https://github.com/Kludex/uvicorn/pull/2711, is it enough here?
Is this issue fixed? I'm experiencing the same issue on 0.37
edit: sorry my bad - I can confirm, i'ts now working as expected! Thank You!
Thanks @Kludex this looks great!
@DavidKaub The fix exposes a new configuration allowing you to increase the timeout from the default of 5 seconds, but the default itself wasn't changed.
I think we are good here.
I'm still facing it with uvicorn>=0.37.0.
it looks like
...
INFO: Waiting for child process [15887]
INFO: Child process [15887] died
INFO: Waiting for child process [15888]
INFO: Child process [15888] died
...
Even after configuring the timeout from 5 seconds to something greater?
solve it with 15 seconds time out.
Issue: Uvicorn introduced a health check into the process manager, which was able to restart dead workers. This broke some applications because they were taking long to start, and timing out this health check. By default this value is 5 seconds.
With the latest release, you are able to set --timeout-worker-healthcheck to a higher number.
I'm locking this issue so users don't need to scroll around to find answers.
Also, if you think this is not a sensible default, please create a new discussion.