FATE
FATE copied to clipboard
部署fate_on_spark、pulsar集群后执行job报错
报错信息如下: (venv) [app@vm-0-2-centos conf]$ flow test toy -gid 9999 -hid 10000 { "jobId": "202203221731177821370", "retcode": 103, "retmsg": "Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 133, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {9999: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}})\n" } 还有在部署过程中,在安装配置openresty-nginx时,官方文档中路由配置文件route_table.yaml中proxy端口为9390,coordination_http_proxy.conf配置文件中server端口为9300,在启动服务后服务端口为9300,但在官方文档中fate配置文件service_conf.yaml中配置nginx http_port成9390后会导致启动job时无法访问到9390端口,正常应该配置为9300。但是启动nginx后,页面访问该端口出现报错如下: An error occurred. Sorry, the page you are looking for is currently unavailable. Please try again later. If you are the system administrator of this resource then you should check the error log for details. Faithfully yours, OpenResty 不知该问题是否对job执行有影响,如果有影响又该怎样解决呢
同样报错问题,请问问题解决了吗?
具体细节:部署安装 Fate on Spark (RabbitMQ)启动 nginx,fate fateboard 后 单方测试 flow test toy -gid 10000 -hid 10000
{
"jobId": "202206012243132178820",
"retcode": 103,
"retmsg": "Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 133, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}})\n"
}
Fate conf 中 Nginx 端口已经改为9300, 开启nginx 后可查到,其中9300 网页同样显示错误:
tcp 0 0 0.0.0.0:9300 0.0.0.0:* LISTEN 70481/nginx: master
tcp 0 0 0.0.0.0:9302 0.0.0.0:* LISTEN 70481/nginx: master
tcp 0 0 0.0.0.0:9310 0.0.0.0:* LISTEN 70481/nginx: master
同样报错问题,请问问题解决了吗? 具体细节:部署安装 Fate on Spark (RabbitMQ)启动 nginx,fate fateboard 后 单方测试
flow test toy -gid 10000 -hid 10000{ "jobId": "202206012243132178820", "retcode": 103, "retmsg": "Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 133, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}})\n" } Fate conf 中 Nginx 端口已经改为9300, 开启nginx 后可查到,其中9300 网页同样显示错误: tcp 0 0 0.0.0.0:9300 0.0.0.0:* LISTEN 70481/nginx: master tcp 0 0 0.0.0.0:9302 0.0.0.0:* LISTEN 70481/nginx: master tcp 0 0 0.0.0.0:9310 0.0.0.0:* LISTEN 70481/nginx: master
同样报错问题,请问问题解决了吗? 具体细节:部署安装 Fate on Spark (RabbitMQ)启动 nginx,fate fateboard 后 单方测试
flow test toy -gid 10000 -hid 10000{ "jobId": "202206012243132178820", "retcode": 103, "retmsg": "Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 133, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'}}})\n" } Fate conf 中 Nginx 端口已经改为9300, 开启nginx 后可查到,其中9300 网页同样显示错误: tcp 0 0 0.0.0.0:9300 0.0.0.0:* LISTEN 70481/nginx: master tcp 0 0 0.0.0.0:9302 0.0.0.0:* LISTEN 70481/nginx: master tcp 0 0 0.0.0.0:9310 0.0.0.0:* LISTEN 70481/nginx: master
已经解决了,这些nginx端口只是作为代理用的,所以页面访问不了是正常的。这种错误是由于fate配置service_conf.yaml配置不当引起的。
感谢回复! 请教这种情况下 service_conf.yaml 配置修改 nginx直接使用http_port:9300 吗? 是否还有其他需要注意的地方呢?
是的,主要注意就是service_conf.yaml文件以及同级目录的路由表文件这两个的配置就行了。