kubeflow-manifests
kubeflow-manifests copied to clipboard
关于数据库的POD启动报错
其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:
- katib-db-manager:
E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
- katib-mysql: 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Initializing database files 2021-08-27T02:31:36.865722Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.24) initializing of server in progress as process 44 2021-08-27T02:31:36.870024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T02:31:52.754440Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T02:32:53.159102Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下:
2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server -
望大佬提供一些解决思路,感谢!
我的报错和上面一样,大佬看看:
katib-db-manager log:
E0827 04:22:23.644805 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:22:28.668739 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:22:33.664700 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:22:38.652756 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:22:43.644760 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:22:48.668705 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:22:53.660754 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:22:58.652762 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:23:03.644786 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:23:08.672724 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:23:13.660592 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
E0827 04:23:18.652703 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused
F0827 04:23:18.652781 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
katib-mysql log: 2021-08-27 03:41:54+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 03:41:54+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 03:41:54+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T03:41:55.090175Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T03:41:55.126744Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T03:42:24.589423Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T03:42:24.910933Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock 2021-08-27T03:42:25.157690Z 0 [ERROR] [MY-011947] [InnoDB] Cannot open '/var/lib/mysql/datadir/ib_buffer_pool' for reading: No such file or directory 2021-08-27T03:42:25.499502Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T03:42:25.500065Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 2021-08-27T03:42:25.563667Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
mysql log:
2021-08-27T03:38:55.396330Z 0 [Note] Event Scheduler: Loaded 0 events 2021-08-27T03:38:55.396640Z 0 [Note] mysqld: ready for connections. Version: '5.7.33' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) 2021-08-27T03:52:14.357722Z 4 [Note] Aborted connection 4 to db: 'mlpipeline' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:
- katib-db-manager:
E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
- katib-mysql: 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Initializing database files 2021-08-27T02:31:36.865722Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.24) initializing of server in progress as process 44 2021-08-27T02:31:36.870024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T02:31:52.754440Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T02:32:53.159102Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下:
2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server -
望大佬提供一些解决思路,感谢!
你部署的时候有没有卡到,我部署的时候,再跑patch第一的删除的时候卡着不动,然后停了之后手动运行的。
其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:
- katib-db-manager:
E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
- katib-mysql: 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Initializing database files 2021-08-27T02:31:36.865722Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.24) initializing of server in progress as process 44 2021-08-27T02:31:36.870024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T02:31:52.754440Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T02:32:53.159102Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下: 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server - 望大佬提供一些解决思路,感谢!
你部署的时候有没有卡到,我部署的时候,再跑patch第一的删除的时候卡着不动,然后停了之后手动运行的。
没有卡住,就只是这两个POD一直跑不起来
其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:
- katib-db-manager:
E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
- katib-mysql: 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Initializing database files 2021-08-27T02:31:36.865722Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.24) initializing of server in progress as process 44 2021-08-27T02:31:36.870024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T02:31:52.754440Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T02:32:53.159102Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下: 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server - 望大佬提供一些解决思路,感谢!
你部署的时候有没有卡到,我部署的时候,再跑patch第一的删除的时候卡着不动,然后停了之后手动运行的。
没有卡住,就只是这两个POD一直跑不起来
我现在问题和你一样,感觉是数据库认证问题。。
@xiashenzhen @WMeng1 你们看看PVC是否有问题:
kubectl get pvc -A
这个mysql应用是很简单的,有可能是你们之前安装出错没有删除导致,关于这个mysql,你们可以看 https://github.com/shikanon/kubeflow-manifests/blob/50ee9f1e0aef5f69620db89c9ae2f81c9b2d96e3/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml#L616
PVC处于pending状态,我刚才这样重新部署了一下这个yaml文件 kubectl delete -f "/opt/wangm/kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml" kubectl apply -f "/opt/wangm/kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml" 但是这几个POD仍然起不了,包括之前我也是delete,start了很多次集群,但是一直这两个节点都是起不来的,不知道我的操作哪里出现了问题,还是没能把问题解决掉,感谢大佬回复
@xiashenzhen @WMeng1 你们看看PVC是否有问题:
kubectl get pvc -A
这个mysql应用是很简单的,有可能是你们之前安装出错没有删除导致,关于这个mysql,你们可以看
https://github.com/shikanon/kubeflow-manifests/blob/50ee9f1e0aef5f69620db89c9ae2f81c9b2d96e3/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml#L616
感谢回复,我看了下,存贮是没有问题的,不知道为什么,就是这两个POD有问题
我把pod删掉重启也不行。。。
不知道是不是版本的问题,我用的kubectl 1.20.5
kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库
kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库
你先跑patch里面的东西,delete一遍,然后apply,最后再删除重建
kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库
你先跑patch里面的东西,delete一遍,然后apply,最后再删除重建。跑起来就是配置问题,找到原因就可以解决了
kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml
我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库
你先跑patch里面的东西,delete一遍,然后apply,最后再删除重建
跑起来就是配置问题,按个配置不对,你logs下看看
logs仍然是最开始的那几条log,mysql显示一条warning,db显示联不通数据库
@WMeng1 确保你的 pvc 被删除了,然后单独 apply 这个 019-katib-installs-katib-with-kubeflow-cert-manager.yaml
文件,或者更精确的,你可以单独 apply 我上面评论的那个 deployment 文件。
@shikanon 我也遇到了相同的问题,katib-mysql有问题,也是db-manage和katib-mysql的Pod没起来:
[root@master kubeflow-manifests]# kubectl get pod -nkubeflow
NAME READY STATUS RESTARTS AGE
admission-webhook-deployment-5f5cc7968b-ck6wk 1/1 Running 0 4d17h
cache-deployer-deployment-64598b6c87-rk4x6 2/2 Running 1 32m
cache-server-59d67c7584-mt6xs 2/2 Running 0 32m
centraldashboard-7b6b6cc7fc-7fg5s 1/1 Running 0 4d17h
jupyter-web-app-deployment-7c6974bb88-cdc6w 1/1 Running 0 4d17h
katib-controller-7b784c44dd-6r56w 1/1 Running 0 29m
katib-db-manager-6c5757dc64-6f6v9 0/1 CrashLoopBackOff 8 29m
katib-mysql-79d75c7444-g4zkv 0/1 Running 1 29m
katib-ui-69f5b6795d-rpxtg 1/1 Running 0 29m
kfserving-controller-manager-0 2/2 Running 0 4d17h
kubeflow-pipelines-profile-controller-76c45c8c6b-8b9gm 1/1 Running 0 32m
metacontroller-0 1/1 Running 0 32m
metadata-envoy-deployment-56f745f7fb-gwt8n 1/1 Running 0 32m
metadata-grpc-deployment-6494577fdb-xm7qp 2/2 Running 1 32m
metadata-writer-b7ff9787-jl6xq 2/2 Running 1 32m
minio-57bcb749d5-7ph7n 2/2 Running 0 32m
ml-pipeline-66bcb9d79d-h5p72 2/2 Running 0 32m
ml-pipeline-persistenceagent-7fb8f6dc68-mwkbg 2/2 Running 0 32m
ml-pipeline-scheduledworkflow-64bcfd6596-xtwlt 2/2 Running 0 32m
ml-pipeline-ui-8578f6685f-2ws4h 2/2 Running 0 32m
ml-pipeline-viewer-crd-565fb9b5c5-qkzkx 2/2 Running 1 32m
ml-pipeline-visualizationserver-b7c7d49fb-qbckt 2/2 Running 0 32m
mpi-operator-794849c566-xc7g4 1/1 Running 2 4d17h
mxnet-operator-6668d797d4-s2pth 1/1 Running 2 4d17h
mysql-9dfc684cd-lqjwr 2/2 Running 0 32m
notebook-controller-deployment-6795dd887b-gctm4 1/1 Running 0 4d17h
profiles-deployment-84bd4f9bc7-dj7d5 2/2 Running 0 4d17h
pytorch-operator-6887749499-n59mc 2/2 Running 5 4d17h
tensorboard-controller-controller-manager-dd896c8df-c8gns 3/3 Running 15 4d17h
tensorboards-web-app-deployment-5969cd5b68-mwcxn 1/1 Running 0 4d17h
tf-job-operator-ccb48b77b-2c9vm 1/1 Running 2 4d17h
volumes-web-app-deployment-867dfb5b5c-vxbfh 1/1 Running 0 4d17h
workflow-controller-74b88f9855-rvd2h 2/2 Running 2 32m
xgboost-operator-deployment-665cf9bf8d-wz2vb 2/2 Running 1 4d17h
查看PVC的状态
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
istio-system authservice-pvc Bound pvc-b18f3d6b-dc68-4ec0-b343-ba02cc8dac11 10Gi RWO rook-ceph-block 4d17h
kubeflow katib-mysql Bound pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466 10Gi RWO rook-ceph-block 31m
kubeflow minio-pvc Bound pvc-62fa4d01-3bce-4e89-89e2-84cf2889e361 20Gi RWO rook-ceph-block 35m
kubeflow mysql-pv-claim Bound pvc-f60480e7-f5a2-4991-8e92-40ea95f4d953 20Gi RWO rook-ceph-block 35m
事件描述具体如下
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 22m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 22m default-scheduler Successfully assigned kubeflow/katib-mysql-79d75c7444-g4zkv to node2
Normal SuccessfulAttachVolume 22m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466"
Warning Unhealthy 20m (x3 over 21m) kubelet Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Normal Killing 20m kubelet Container katib-mysql failed liveness probe, will be restarted
Normal Pulled 20m (x2 over 21m) kubelet Container image "registry.cn-shenzhen.aliyuncs.com/tensorbytes/mysql:8-0627e" already present on machine
Normal Created 20m (x2 over 21m) kubelet Created container katib-mysql
Normal Started 20m (x2 over 21m) kubelet Started container katib-mysql
Warning Unhealthy 20m kubelet Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "process_linux.go:101: executing setns process caused \"exit status 1\"": unknown
Warning Unhealthy 20m (x9 over 21m) kubelet Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Warning Unhealthy 98s (x111 over 19m) kubelet Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
我不知道该怎么排除这个错误,这问题是数据库认证出现了问题,我尝试了了把所有关于mysql的PVC重启然后重新部署,但没有启动任何作用,希望大神能指点一二。 下面是katib-mysql的日志:
[root@master kubeflow-manifests]# kubectl -n kubeflow logs katib-mysql-79d75c7444-g4zkv
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started.
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started.
2021-11-24T01:37:13.250055Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1
2021-11-24T01:37:13.265138Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-11-24T01:37:23.637424Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
InnoDB: Progress in percents: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 172021-11-24T01:37:23.956283Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 442021-11-24T01:37:24.198622Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
45 462021-11-24T01:37:24.217075Z 0 [System] [MY-010232] [Server] XA crash recovery finished.
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 792021-11-24T01:37:24.553035Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-11-24T01:37:24.553418Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
802021-11-24T01:37:24.561314Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
81 82 83 842021-11-24T01:37:24.597879Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.24' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL.
@Perhurb 可以尝试将kubeflow命名空间下的PVC全部删除再重新apply
@shikanon 我也遇到了相同的问题,katib-mysql有问题,也是db-manage和katib-mysql的Pod没起来:
[root@master kubeflow-manifests]# kubectl get pod -nkubeflow NAME READY STATUS RESTARTS AGE admission-webhook-deployment-5f5cc7968b-ck6wk 1/1 Running 0 4d17h cache-deployer-deployment-64598b6c87-rk4x6 2/2 Running 1 32m cache-server-59d67c7584-mt6xs 2/2 Running 0 32m centraldashboard-7b6b6cc7fc-7fg5s 1/1 Running 0 4d17h jupyter-web-app-deployment-7c6974bb88-cdc6w 1/1 Running 0 4d17h katib-controller-7b784c44dd-6r56w 1/1 Running 0 29m katib-db-manager-6c5757dc64-6f6v9 0/1 CrashLoopBackOff 8 29m katib-mysql-79d75c7444-g4zkv 0/1 Running 1 29m katib-ui-69f5b6795d-rpxtg 1/1 Running 0 29m kfserving-controller-manager-0 2/2 Running 0 4d17h kubeflow-pipelines-profile-controller-76c45c8c6b-8b9gm 1/1 Running 0 32m metacontroller-0 1/1 Running 0 32m metadata-envoy-deployment-56f745f7fb-gwt8n 1/1 Running 0 32m metadata-grpc-deployment-6494577fdb-xm7qp 2/2 Running 1 32m metadata-writer-b7ff9787-jl6xq 2/2 Running 1 32m minio-57bcb749d5-7ph7n 2/2 Running 0 32m ml-pipeline-66bcb9d79d-h5p72 2/2 Running 0 32m ml-pipeline-persistenceagent-7fb8f6dc68-mwkbg 2/2 Running 0 32m ml-pipeline-scheduledworkflow-64bcfd6596-xtwlt 2/2 Running 0 32m ml-pipeline-ui-8578f6685f-2ws4h 2/2 Running 0 32m ml-pipeline-viewer-crd-565fb9b5c5-qkzkx 2/2 Running 1 32m ml-pipeline-visualizationserver-b7c7d49fb-qbckt 2/2 Running 0 32m mpi-operator-794849c566-xc7g4 1/1 Running 2 4d17h mxnet-operator-6668d797d4-s2pth 1/1 Running 2 4d17h mysql-9dfc684cd-lqjwr 2/2 Running 0 32m notebook-controller-deployment-6795dd887b-gctm4 1/1 Running 0 4d17h profiles-deployment-84bd4f9bc7-dj7d5 2/2 Running 0 4d17h pytorch-operator-6887749499-n59mc 2/2 Running 5 4d17h tensorboard-controller-controller-manager-dd896c8df-c8gns 3/3 Running 15 4d17h tensorboards-web-app-deployment-5969cd5b68-mwcxn 1/1 Running 0 4d17h tf-job-operator-ccb48b77b-2c9vm 1/1 Running 2 4d17h volumes-web-app-deployment-867dfb5b5c-vxbfh 1/1 Running 0 4d17h workflow-controller-74b88f9855-rvd2h 2/2 Running 2 32m xgboost-operator-deployment-665cf9bf8d-wz2vb 2/2 Running 1 4d17h
查看PVC的状态
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE istio-system authservice-pvc Bound pvc-b18f3d6b-dc68-4ec0-b343-ba02cc8dac11 10Gi RWO rook-ceph-block 4d17h kubeflow katib-mysql Bound pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466 10Gi RWO rook-ceph-block 31m kubeflow minio-pvc Bound pvc-62fa4d01-3bce-4e89-89e2-84cf2889e361 20Gi RWO rook-ceph-block 35m kubeflow mysql-pv-claim Bound pvc-f60480e7-f5a2-4991-8e92-40ea95f4d953 20Gi RWO rook-ceph-block 35m
事件描述具体如下
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 22m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Warning FailedScheduling 22m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Normal Scheduled 22m default-scheduler Successfully assigned kubeflow/katib-mysql-79d75c7444-g4zkv to node2 Normal SuccessfulAttachVolume 22m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466" Warning Unhealthy 20m (x3 over 21m) kubelet Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure. mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)' Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists! Normal Killing 20m kubelet Container katib-mysql failed liveness probe, will be restarted Normal Pulled 20m (x2 over 21m) kubelet Container image "registry.cn-shenzhen.aliyuncs.com/tensorbytes/mysql:8-0627e" already present on machine Normal Created 20m (x2 over 21m) kubelet Created container katib-mysql Normal Started 20m (x2 over 21m) kubelet Started container katib-mysql Warning Unhealthy 20m kubelet Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "process_linux.go:101: executing setns process caused \"exit status 1\"": unknown Warning Unhealthy 20m (x9 over 21m) kubelet Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure. ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) Warning Unhealthy 98s (x111 over 19m) kubelet Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure. ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
我不知道该怎么排除这个错误,这问题是数据库认证出现了问题,我尝试了了把所有关于mysql的PVC重启然后重新部署,但没有启动任何作用,希望大神能指点一二。 下面是katib-mysql的日志:
[root@master kubeflow-manifests]# kubectl -n kubeflow logs katib-mysql-79d75c7444-g4zkv 2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-11-24T01:37:13.250055Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-11-24T01:37:13.265138Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-11-24T01:37:23.637424Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. InnoDB: Progress in percents: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 172021-11-24T01:37:23.956283Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 442021-11-24T01:37:24.198622Z 0 [System] [MY-010229] [Server] Starting XA crash recovery... 45 462021-11-24T01:37:24.217075Z 0 [System] [MY-010232] [Server] XA crash recovery finished. 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 792021-11-24T01:37:24.553035Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-11-24T01:37:24.553418Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 802021-11-24T01:37:24.561314Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory. 81 82 83 842021-11-24T01:37:24.597879Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.24' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL.
hi 我遇到了和你一样的问题 你最后解决了吗 怎么解决的
已解决