FATE icon indicating copy to clipboard operation
FATE copied to clipboard

federatedai/python:1.8.0-release 容器进程一直在重启

Open quanh1990 opened this issue 2 years ago • 1 comments

所有docker进程如下:

CONTAINER ID   IMAGE                                     COMMAND                  CREATED       STATUS                          PORTS                                                                                                                                           NAMES
7b6bd64ef7d4   federatedai/serving-proxy:2.0.4-release   "/bin/sh -c 'java -D…"   2 hours ago   Up 2 hours                      0.0.0.0:8059->8059/tcp, :::8059->8059/tcp, 0.0.0.0:8869->8869/tcp, :::8869->8869/tcp, 8879/tcp                                                  serving-10000-serving-proxy-1
c8b4f4d8261e   federatedai/serving-admin:2.0.4-release   "/bin/sh -c 'java -c…"   2 hours ago   Up 2 hours                      8000/tcp, 0.0.0.0:8350->8350/tcp, :::8350->8350/tcp                                                                                             serving-10000-serving-admin-1
e04e8f5cd40f   bitnami/zookeeper:3.7.0                   "/opt/bitnami/script…"   2 hours ago   Up 2 hours                      0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 8080/tcp, 0.0.0.0:49156->2888/tcp, :::49156->2888/tcp, 0.0.0.0:49155->3888/tcp, :::49155->3888/tcp   serving-10000-serving-zookeeper-1
0e24c124ca26   federatedai/client:1.8.0-release          "/bin/sh -c 'flow in…"   2 hours ago   Up 2 hours                      0.0.0.0:20000->20000/tcp, :::20000->20000/tcp                                                                                                   confs-10000-client-1
9c69b80b4ef0   federatedai/fateboard:1.8.0-release       "/bin/sh -c 'java -D…"   2 hours ago   Up 2 hours                      0.0.0.0:8080->8080/tcp, :::8080->8080/tcp                                                                                                       confs-10000-fateboard-1
46cee6e55b06   federatedai/python:1.8.0-release          "container-entrypoin…"   2 hours ago   Restarting (1) 49 seconds ago                                                                                                                                                   confs-10000-python-1
5422f5ab596b   federatedai/eggroll:1.8.0-release         "/tini -- bash -c 'j…"   2 hours ago   Up 2 hours                      4671/tcp, 8080/tcp                                                                                                                              confs-10000-nodemanager-1
f82fddd9eea3   federatedai/eggroll:1.8.0-release         "/tini -- bash -c 'j…"   2 hours ago   Up 2 hours                      8080/tcp, 0.0.0.0:9370->9370/tcp, :::9370->9370/tcp                                                                                             confs-10000-rollsite-1
2de0bdb61314   federatedai/eggroll:1.8.0-release         "/tini -- bash -c 'j…"   2 hours ago   Up 2 hours                      4670/tcp, 8080/tcp                                                                                                                              confs-10000-clustermanager-1
422f2f0e6440   mysql:8.0.28                              "docker-entrypoint.s…"   2 hours ago   Up 2 hours                      3306/tcp, 33060/tcp                                                                                                                             confs-10000-mysql-1

federatedai/python:1.8.0-release 这个容器状态就是一直在restarting,报错日志如下:

+ python fateflow/python/fate_flow/fate_flow_server.py
2022-05-25T06:28:04.855239420Z Traceback (most recent call last):
2022-05-25T06:28:04.855287176Z   File "fateflow/python/fate_flow/fate_flow_server.py", line 57, in <module>
2022-05-25T06:28:04.855387775Z     init_flow_db()
2022-05-25T06:28:04.855412244Z   File "/opt/app-root/lib/python3.6/site-packages/peewee.py", line 393, in inner
2022-05-25T06:28:04.855539446Z     with self:
2022-05-25T06:28:04.855547109Z   File "/opt/app-root/lib/python3.6/site-packages/peewee.py", line 2724, in __enter__
2022-05-25T06:28:04.856214811Z     self.db.connect()
2022-05-25T06:28:04.856224624Z   File "/opt/app-root/lib/python3.6/site-packages/playhouse/pool.py", line 108, in connect
2022-05-25T06:28:04.856308749Z     return super(PooledDatabase, self).connect(reuse_if_open)
2022-05-25T06:28:04.856316107Z   File "/opt/app-root/lib/python3.6/site-packages/peewee.py", line 2808, in connect
2022-05-25T06:28:04.856968090Z     raise InterfaceError('Error, database must be initialized '

又去查了一下mysql的日志发现database初始化失败,是由于在执行 /data/projects/fate/confs-10000/confs/mysql/init/create-eggroll-meta-tables.sql的时候,要创建一个名字为 ‘’ 的database,但是mysql不允许

CREATE DATABASE IF NOT EXISTS ``;
2022-05-25T06:25:43.121160692Z 2022-05-25 06:25:43+00:00 [Note] [Entrypoint]: /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/create-eggroll-meta-tables.sql
2022-05-25T06:25:43.129941582Z ERROR 1102 (42000) at line 2: Incorrect database name ''

回去主机查看这个init sql,地址在 docker-deploy/training_template/public/mysql/init/create-eggroll-meta-tables.sql,发现这里create database人家是有名字的

CREATE DATABASE IF NOT EXISTS `eggroll_meta`;

但是主机生成的集群启动文件 ~/docker-deploy/outputs 文件夹中的sql 表名就不见了,也是

CREATE DATABASE IF NOT EXISTS ``;

所以一定要在主机的conf配置文件中配置mysql信息吗?

# modify if you are going to use an external db
mysql_ip=mysql
mysql_user=fate
mysql_password=fate_dev
mysql_db=fate_flow

quanh1990 avatar May 25 '22 03:05 quanh1990

同遇到了这个问题,您是怎么解决的呢

gxcuit avatar Sep 15 '22 10:09 gxcuit