PaddleHub icon indicating copy to clipboard operation
PaddleHub copied to clipboard

hub serving配置use_multiprocess配置true后lac模型预测无法启动WORKER TIMEOUT

Open water-2022 opened this issue 2 years ago • 0 comments

1.测试环境: CentOS Linux release 7.9.2009 (Core) CPU(s): 8 mem: 32g conda 4.6.14 Python 3.10.4 paddlehub 2.3.1 paddlenlp 2.5.1 paddlepaddle 2.4.2 lac 2.4.0

2.配置文件:

{
  "modules_info": {
    "lac": {
      "init_args": {
        "version": "2.4.0",
        "user_dict": "./dict.txt"
      },
      "predict_args": {}
    }
  },
  "port": 8866,
  "use_multiprocess": true,
  "workers": 2,
  "timeout": 3000
}

3.运行命令及报错信息

(paddlepaddle-test) [root@nlp68 hub]# hub serving start -c serving_config.json
/root/.conda/envs/paddlepaddle-test/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
[2023-03-10 11:25:06 +0800] [21115] [INFO] Starting gunicorn 20.1.0
[2023-03-10 11:25:06 +0800] [21115] [INFO] Listening at: http://0.0.0.0:8866 (21115)
[2023-03-10 11:25:06 +0800] [21115] [INFO] Using worker: sync
[2023-03-10 11:25:06 +0800] [21158] [INFO] Booting worker with pid: 21158
[2023-03-10 11:25:06 +0800] [21160] [INFO] Booting worker with pid: 21160
[2023-03-10 11:36:02 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:21158)
[2023-03-10 11:36:03 +0800] [21115] [WARNING] Worker with pid 21158 was terminated due to signal 9
[2023-03-10 11:36:03 +0800] [22783] [INFO] Booting worker with pid: 22783
[2023-03-10 11:36:18 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:21160)
[2023-03-10 11:36:19 +0800] [21115] [WARNING] Worker with pid 21160 was terminated due to signal 9
[2023-03-10 11:36:19 +0800] [22819] [INFO] Booting worker with pid: 22819
[2023-03-10 11:36:49 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:22819)
[2023-03-10 11:36:50 +0800] [22819] [INFO] Worker exiting (pid: 22819)
[2023-03-10 11:36:51 +0800] [21115] [WARNING] Worker with pid 22819 was terminated due to signal 9
[2023-03-10 11:36:51 +0800] [22902] [INFO] Booting worker with pid: 22902
[2023-03-10 11:37:02 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:22783)
[2023-03-10 11:37:03 +0800] [21115] [WARNING] Worker with pid 22783 was terminated due to signal 9
[2023-03-10 11:37:04 +0800] [22939] [INFO] Booting worker with pid: 22939
[2023-03-10 11:37:36 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:22902)
[2023-03-10 11:37:37 +0800] [21115] [WARNING] Worker with pid 22902 was terminated due to signal 9
[2023-03-10 11:37:37 +0800] [23028] [INFO] Booting worker with pid: 23028
[2023-03-10 11:37:38 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:22939)
[2023-03-10 11:37:38 +0800] [22939] [INFO] Worker exiting (pid: 22939)
[2023-03-10 11:37:39 +0800] [21115] [WARNING] Worker with pid 22939 was terminated due to signal 9
[2023-03-10 11:37:39 +0800] [23032] [INFO] Booting worker with pid: 23032
[2023-03-10 11:38:09 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:23032)
[2023-03-10 11:38:10 +0800] [23032] [INFO] Worker exiting (pid: 23032)
[2023-03-10 11:38:10 +0800] [21115] [WARNING] Worker with pid 23032 was terminated due to signal 9
[2023-03-10 11:38:10 +0800] [23103] [INFO] Booting worker with pid: 23103
[2023-03-10 11:38:54 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:23028)
[2023-03-10 11:38:55 +0800] [21115] [WARNING] Worker with pid 23028 was terminated due to signal 9
[2023-03-10 11:38:55 +0800] [23219] [INFO] Booting worker with pid: 23219
[2023-03-10 11:39:02 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:23103)
[2023-03-10 11:39:03 +0800] [21115] [WARNING] Worker with pid 23103 was terminated due to signal 9
[2023-03-10 11:39:03 +0800] [23241] [INFO] Booting worker with pid: 23241
[2023-03-10 11:39:25 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:23219)
[2023-03-10 11:39:27 +0800] [21115] [WARNING] Worker with pid 23219 was terminated due to signal 9
[2023-03-10 11:39:27 +0800] [23294] [INFO] Booting worker with pid: 23294
[2023-03-10 11:40:20 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:23294)
[2023-03-10 11:40:21 +0800] [21115] [WARNING] Worker with pid 23294 was terminated due to signal 9
[2023-03-10 11:40:21 +0800] [23432] [INFO] Booting worker with pid: 23432
[2023-03-10 11:40:25 +0800] [21115] [CRITICAL] WORKER TIMEOUT (pid:23241)
[2023-03-10 11:40:26 +0800] [23241] [INFO] Worker exiting (pid: 23241)
[2023-03-10 11:40:26 +0800] [21115] [WARNING] Worker with pid 23241 was terminated due to signal 9
[2023-03-10 11:40:26 +0800] [23446] [INFO] Booting worker with pid: 23446

4.解决尝试均未成功

  • 之前设置的workers为8,之后设置为2问题依旧。
  • 配置文件中 "timeout": 3000之后问题依旧。
  • 可能是内存不足,查看内存可用还剩下一半16G。

water-2022 avatar Mar 10 '23 04:03 water-2022