uvicorn icon indicating copy to clipboard operation
uvicorn copied to clipboard

Child process died

Open currenttime opened this issue 1 year ago β€’ 13 comments

Initial Checks

  • [X] I confirm this was discussed, and the maintainers suggest I open an issue.
  • [X] I'm aware that if I created this issue without a discussion, it may be closed without a response.

Discussion Link

Command:
uvicorn.run(app='api_server:app', host="0.0.0.0", port=8002, workers=2)

Log:
/ai/miniconda3/bin/conda run -n py311_catvton --no-capture-output python /ai/brx/CatVTON-edited/api_server.py 
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
An error occurred while trying to fetch booksforcharlie/stable-diffusion-inpainting: booksforcharlie/stable-diffusion-inpainting does not appear to have a file named diffusion_pytorch_model.safetensors.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
INFO:     Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit)
INFO:     Started parent process [15707]
INFO:     Waiting for child process [15885]
INFO:     Child process [15885] died
Fetching 10 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 46345.90it/s]
Fetching 10 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 91779.08it/s]
Fetching 10 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 90394.48it/s]
INFO:     Waiting for child process [15886]
INFO:     Child process [15886] died
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
Fetching 10 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 22465.47it/s]
INFO:     Waiting for child process [15887]
/ai/miniconda3/envs/py311_catvton/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO:     Child process [15887] died

Description

Child process died

Example Code

No response

Python, Uvicorn & OS Version

Running uvicorn 0.30.0 with CPython 3.11.10 on Linux

currenttime avatar Nov 07 '24 05:11 currenttime

same here.

I encountered issues running my FastAPI project using different versions of Uvicorn within a Docker container. Details are as follows:

  1. Environment

    • FastAPI version: 0.115.6
    • Python version: 3.12-alpine
    • Running in Docker
  2. Issue with uvicorn <= 0.29.0
    When I use Uvicorn versions <= 0.29.0, the process gets permanently stuck at the following log message with no more output:

2024/12/16 11:44:36.354     INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)  
2024/12/16 11:44:36.354     INFO: [uvicorn.error] Started parent process [1]  
  1. Issue with uvicorn >= 0.30.0 (including the latest version 0.34.0)
    With newer versions of Uvicorn, the process starts, but the child processes keep dying repeatedly:
2024/12/16 12:20:06.593     INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024/12/16 12:20:06.594     INFO: [uvicorn.error] Started parent process [1]
2024/12/16 12:20:07.622     INFO: [uvicorn.error] Waiting for child process [18]
2024/12/16 12:20:07.623     INFO: [uvicorn.error] Child process [18] died
2024/12/16 12:20:07.623     INFO: [uvicorn.error] Waiting for child process [19]
2024/12/16 12:20:07.624     INFO: [uvicorn.error] Child process [19] died
2024/12/16 12:20:08.127     INFO: [uvicorn.error] Waiting for child process [19]
2024/12/16 12:20:08.129     INFO: [uvicorn.error] Child process [19] died
2024/12/16 12:20:13.136     INFO: [uvicorn.error] Waiting for child process [23]
2024/12/16 12:20:13.137     INFO: [uvicorn.error] Child process [23] died
... (repeats indefinitely) ...  
  1. Docker Setup
    It's my Dockerfile:
FROM python:3.12-alpine  

RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories \  
    && apk add --update caddy gcc musl-dev libffi-dev  

WORKDIR /app  
COPY Backend/requirements.txt /tmp/requirements.txt  
RUN pip install --no-cache-dir -r /tmp/requirements.txt -i https://mirrors.aliyun.com/pypi/simple  

COPY Backend ./backend  

EXPOSE 8000  

COPY startup.sh /app/startup.sh  
RUN chmod +x /app/startup.sh  
CMD ["/app/startup.sh"]  

My startup.sh script:

#!/bin/sh  
set -e  

cd /app/backend  

exec uvicorn main:get_app \  
    --host 0.0.0.0 \  
    --port 8000 \  
    --workers 2 \  
    --proxy-headers \  
    --forwarded-allow-ips '*' \  
    --factory \  
    --log-config logging_config.yaml  
  1. Local Machine Behavior
    When I run the same command directly on my Windows11 PC, The program can run normally:
uvicorn main:get_app --host 0.0.0.0 --port 8000 --workers 2 --proxy-headers --factory --log-config logging_config.yaml  

Yilimmilk avatar Dec 16 '24 04:12 Yilimmilk

When I start my service by fastapi==0.115.0 and uvicorn[standard]==0.34.0, workers set up as 2, the children process died and there is an error shown below:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\python_projects\ui_maker.venv\Lib\site-packages\starlette\routing.py", line 700, in lifespan await receive() File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\lifespan\on.py", line 137, in receive return await self.receive_queue.get() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\queues.py", line 158, in get await getter asyncio.exceptions.CancelledError

Process SpawnProcess-1: Traceback (most recent call last): File "D:\python311\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "D:\python311\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn_subprocess.py", line 80, in subprocess_started target(sockets=sockets) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\supervisors\multiprocess.py", line 63, in target return self.real_target(sockets) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 66, in run return asyncio.run(self.serve(sockets=sockets)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\base_events.py", line 654, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 70, in serve await self._serve(sockets) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 85, in _serve await self.startup(sockets=sockets) File "D:\python_projects\ui_maker.venv\Lib\site-packages\uvicorn\server.py", line 135, in startup server = await loop.create_server(create_protocol, sock=sock, ssl=config.ssl, backlog=config.backlog) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\python311\Lib\asyncio\base_events.py", line 1561, in create_server server._start_serving() File "D:\python311\Lib\asyncio\base_events.py", line 316, in _start_serving sock.listen(self._backlog) OSError: [WinError 10022] 提供了一δΈͺζ— ζ•ˆηš„ε‚ζ•°γ€‚

how to resolve this problem? my system is win10

hunter2009pf avatar Dec 19 '24 03:12 hunter2009pf

I've also experienced a similar behavior, but in my case, the child process doesn't die on startup. When a request is received, it fetches a lot of data from a database to process it. As soon as the machine RAM exceeds 4-5GB, the child process dies. Now my machine's RAM is 64GB and when I was testing, there was no other request to cause RAM congestion.

However, I noticed that if I run it in a single worker mode, it does not happen.

Here's the issue link I posted https://stackoverflow.com/questions/79311202/fastapi-child-process-gets-killed-even-with-enough-unused-ram-and-cpu-left-in-th?noredirect=1#comment139857620_79311202

sriramr98 avatar Dec 30 '24 10:12 sriramr98

I updated from version 0.27.0 to 0.34.0 and started experimenting this same issues with FastAPI

Ricardonacif avatar Jan 03 '25 14:01 Ricardonacif

I assume the ping pong protocol to figure out if child is alive is probably buggy and is killing healthy child processes.

threading.Thread is used to always pong, but that won't exactly work due to GIL. So if you're starting a big process that has a lot of imports, that will not allow this ponging loop to run.

Or, if you're loading/parsing a lot of data into memory (and not using multiprocessing but threading or just normal asyncio) it's possible this ponging loop won't run.

vjeranc avatar Feb 18 '25 10:02 vjeranc

Hello, I also have been impacted by this issue. My API interface is Flask API. but I created the convert to be handled as an asgi interface.

when I set the workers to more than 1, it would be killed automatically

I have received so many log like this.

INFO:     Waiting for child process [23]
INFO:     Child process [23] died

But it's fine when I set the worker only 1.

Host Version

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy

Uvicorn Version

 uvicorn --version
Running uvicorn 0.34.0 with CPython 3.10.12 on Linux

restuhaqza avatar Mar 09 '25 08:03 restuhaqza

It means that the polling thread did not start when the process was spawned. Uvicorn main process kills the child if it does not pong to its ping. That's the log you're seeing. Child did not die, it just did not yet start the thread that is always ponging.

It might be the case that it takes more than 5 seconds for your child to spawn. If instead of spawn, fork is used, this would be a non-issue in cases where your Python imports are slow.

vjeranc avatar Mar 09 '25 11:03 vjeranc

same error even on latest version

Rajrup-TransEV avatar Mar 10 '25 10:03 Rajrup-TransEV

same here.

I encountered issues running my FastAPI project using different versions of Uvicorn within a Docker container. Details are as follows:

  1. Environment

    • FastAPI version: 0.115.6
    • Python version: 3.12-alpine
    • Running in Docker
  2. Issue with uvicorn <= 0.29.0 When I use Uvicorn versions <= 0.29.0, the process gets permanently stuck at the following log message with no more output:

2024/12/16 11:44:36.354     INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)  
2024/12/16 11:44:36.354     INFO: [uvicorn.error] Started parent process [1]  
  1. Issue with uvicorn >= 0.30.0 (including the latest version 0.34.0) With newer versions of Uvicorn, the process starts, but the child processes keep dying repeatedly:
2024/12/16 12:20:06.593     INFO: [uvicorn.error] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024/12/16 12:20:06.594     INFO: [uvicorn.error] Started parent process [1]
2024/12/16 12:20:07.622     INFO: [uvicorn.error] Waiting for child process [18]
2024/12/16 12:20:07.623     INFO: [uvicorn.error] Child process [18] died
2024/12/16 12:20:07.623     INFO: [uvicorn.error] Waiting for child process [19]
2024/12/16 12:20:07.624     INFO: [uvicorn.error] Child process [19] died
2024/12/16 12:20:08.127     INFO: [uvicorn.error] Waiting for child process [19]
2024/12/16 12:20:08.129     INFO: [uvicorn.error] Child process [19] died
2024/12/16 12:20:13.136     INFO: [uvicorn.error] Waiting for child process [23]
2024/12/16 12:20:13.137     INFO: [uvicorn.error] Child process [23] died
... (repeats indefinitely) ...  
  1. Docker Setup It's my Dockerfile:

FROM python:3.12-alpine

RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories \
&& apk add --update caddy gcc musl-dev libffi-dev

WORKDIR /app
COPY Backend/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt -i https://mirrors.aliyun.com/pypi/simple

COPY Backend ./backend

EXPOSE 8000

COPY startup.sh /app/startup.sh
RUN chmod +x /app/startup.sh
CMD ["/app/startup.sh"]
My startup.sh script:

#!/bin/sh
set -e

cd /app/backend

exec uvicorn main:get_app \
--host 0.0.0.0 \
--port 8000 \
--workers 2 \
--proxy-headers \
--forwarded-allow-ips '*' \
--factory \
--log-config logging_config.yaml
5. Local Machine Behavior When I run the same command directly on my Windows11 PC, The program can run normally:

uvicorn main:get_app --host 0.0.0.0 --port 8000 --workers 2 --proxy-headers --factory --log-config logging_config.yaml

Update: I downgrade the Docker image python:3.12-alpine3.21 to python:3.12-alpine3.20, everything works as expected.

Yilimmilk avatar Mar 10 '25 10:03 Yilimmilk

I encountered the exact same issue with the official docker image python:3.12-slim (with Debian bookworm).

When running as a docker container, it crashed on a proxmox VM with limited resources, but worked on my laptop. This only occured when workers > 1 and calling an endpoint which executed a heavy CPU-bound task (top command shew 100% CPU but 5% RAM) With only one worker it did the job, but only after 2 minutes on the VM whereas it took 20s on my laptop.

vincentditlevinz avatar May 09 '25 09:05 vincentditlevinz

I also hit this issue, and it appears to be due the "spawn"ed worker process taking more than 5s to import the target module before its able to respond to pings. I optimized my imports to be lazy and the issue went away...

5 seconds doesn't seem to be enough, nor configurable. https://github.com/encode/uvicorn/blob/66b9b58ad90112d54e7e3c4160c837ec72de51da/uvicorn/supervisors/multiprocess.py#L37-L42

mabrowning avatar May 12 '25 21:05 mabrowning

After some debugging, sharing my findings: https://github.com/encode/uvicorn/releases/tag/0.30.0

In the version 0.30.0 multi-process manager was implemented, which uses spawn by default instead of fork even for Linux systems. spawn is slower than fork so the default timeout of 5s stopped being enough in some instances. The is_alive check starts here

https://github.com/encode/uvicorn/blob/66b9b58ad90112d54e7e3c4160c837ec72de51da/uvicorn/supervisors/multiprocess.py#L65-L69

Currently there is no way to pass the timeout value as a parameter. Here's a PR trying to do just that: https://github.com/encode/uvicorn/pull/2397

If you want to test out if higher timeout fixes your issue here is a quick monkey patch:

import uvicorn

original_uvicorn_is_alive = uvicorn.supervisors.multiprocess.Process.is_alive
def patched_is_alive(self: Any) -> bool:
    timeout = 20
    return original_uvicorn_is_alive(self, timeout)
uvicorn.supervisors.multiprocess.Process.is_alive = patched_is_alive

Petr-Siegl avatar May 20 '25 11:05 Petr-Siegl

For reference, since I lost a week investigating this issue: the same thing happens for me when using FastAPI with long-running BackgroundTasks inside a python:3.12-bookworm based Docker image and --workers 2. Downgrading to uvicorn==0.29.0 fixes the issue.

guillp avatar Jun 03 '25 09:06 guillp

Hi I am also facing the same issue. Does anyone know if I do worker management through gunicorn then will this issue be resolved or it will still be there ? @guillp Have you faced issue after downgrade to version 0.29.0 in this span of one month duration ?

bharatmshrtr avatar Jul 13 '25 11:07 bharatmshrtr

@bharatmshrtr after skimming source code from gunicorn, they do not use process pingpong loop and won't kill your workers if it takes them a few seconds to start up. of course, they might also use more efficient fork (where children share parent memory, so interpreter is already up and all python modules are loaded) so you won't have that problem anyway, I did not check.

vjeranc avatar Jul 14 '25 07:07 vjeranc

What is the status with resolving this issue? My experience with airflow 3.0.6 environment is that the system is very unpredictable with uvicorn 3.x because of the child died issue. On a positive note, the following monkey patch fixes the problem following the suggestion above at https://github.com/Kludex/uvicorn/issues/2506#issuecomment-2894004072.

import uvicorn
try:
    original_is_alive = uvicorn.supervisors.multiprocess.Process.is_alive

    def patched_is_alive(self, *args, **kwargs):
        # Force a default timeout if none is provided
        timeout = kwargs.get("timeout", 30)
        return original_is_alive(self, timeout=timeout)

    uvicorn.supervisors.multiprocess.Process.is_alive = patched_is_alive
    print("[sitecustomize] Patched uvicorn.supervisors.multiprocess.Process.is_alive with timeout=30")
except Exception as e:
    # Don't break Python startup if uvicorn internals change
    print(f"[sitecustomize] Skipped uvicorn patch: {e}")

It is obvious that the default 5s timeout is problematic. I changed it in the patch to 30 as shown above and my system works stably and fast. Is there a potential problem for uvicorn to at least allow a change of timeout value?

piosystems avatar Sep 16 '25 18:09 piosystems

  • I've merged https://github.com/Kludex/uvicorn/pull/2711, is it enough here?

Kludex avatar Sep 23 '25 13:09 Kludex

Is this issue fixed? I'm experiencing the same issue on 0.37 edit: sorry my bad - I can confirm, i'ts now working as expected! Thank You!

DavidKaub avatar Sep 23 '25 14:09 DavidKaub

Thanks @Kludex this looks great!

@DavidKaub The fix exposes a new configuration allowing you to increase the timeout from the default of 5 seconds, but the default itself wasn't changed.

mabrowning avatar Sep 24 '25 02:09 mabrowning

I think we are good here.

Kludex avatar Sep 30 '25 10:09 Kludex

I'm still facing it with uvicorn>=0.37.0. it looks like

...
INFO:     Waiting for child process [15887]
INFO:     Child process [15887] died
INFO:     Waiting for child process [15888]
INFO:     Child process [15888] died
...

hihunjin avatar Oct 01 '25 08:10 hihunjin

Even after configuring the timeout from 5 seconds to something greater?

vjeranc avatar Oct 01 '25 09:10 vjeranc

solve it with 15 seconds time out.

hihunjin avatar Oct 01 '25 12:10 hihunjin

Issue: Uvicorn introduced a health check into the process manager, which was able to restart dead workers. This broke some applications because they were taking long to start, and timing out this health check. By default this value is 5 seconds.

With the latest release, you are able to set --timeout-worker-healthcheck to a higher number.


I'm locking this issue so users don't need to scroll around to find answers.

Also, if you think this is not a sensible default, please create a new discussion.

Kludex avatar Oct 01 '25 12:10 Kludex