datasophon icon indicating copy to clipboard operation
datasophon copied to clipboard

[Bug] [Module Name] Bug title

Open WillJiang12 opened this issue 1 year ago • 4 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

Deployment cluster prompts distribution failure, please check the agent side logs 20230516191249

What you expected to happen

[root@p-mn-01 logs]# cat datasophon-worker.log [INFO] 2023-05-16 19:00:46 com.datasophon.common.utils.ShellUtils:[96] - 脚本返回的数据如下: x86_64 [INFO] 2023-05-16 19:00:46 com.datasophon.common.utils.ShellUtils:[169] - stopping node [INFO] 2023-05-16 19:00:49 com.datasophon.common.utils.ShellUtils:[169] - End stop node. [INFO] 2023-05-16 19:00:49 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:00:49 com.datasophon.worker.WorkerApplicationServer:[166] - Worker server stopped [INFO] 2023-05-16 19:00:54 com.datasophon.common.utils.ShellUtils:[96] - 脚本返回的数据如下: x86_64 [INFO] 2023-05-16 19:00:54 akka.event.slf4j.Slf4jLogger:[92] - Slf4jLogger started [INFO] 2023-05-16 19:00:54 akka.remote.Remoting:[83] - Starting remoting [INFO] 2023-05-16 19:00:55 akka.remote.Remoting:[83] - Remoting started; listening on addresses :[akka.tcp://datasophon@p-mn-01:2552] [INFO] 2023-05-16 19:00:55 akka.remote.Remoting:[83] - Remoting now listens on addresses: [akka.tcp://datasophon@p-mn-01:2552] [INFO] 2023-05-16 19:00:55 com.datasophon.common.utils.ShellUtils:[169] - no node to stop [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - starting node, logging to /opt/datasophon/datasophon-worker/node/x86/logs/node-p-mn-01.out [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - nohup /opt/datasophon/datasophon-worker/node/x86/node_exporter > /opt/datasophon/datasophon-worker/node/x86/logs/node-p-mn-01.out 2>&1 & [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - End restart node. [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1003(hive) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1004(elastic) gid=1004(elastic) 组=1004(elastic) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1005(hdfs) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1006(yarn) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1007(mapred) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1008(hbase) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[67] - 脚本返回的数据如下:{coreNum: 12, totalMem: 31.2607, totalDisk: 492.06} [INFO] 2023-05-16 19:01:05 com.datasophon.worker.WorkerApplicationServer:[155] - host info collect result:com.datasophon.common.utils.ExecResult@7749bf93 [INFO] 2023-05-16 19:01:05 com.datasophon.worker.WorkerApplicationServer:[93] - start worker [INFO] 2023-05-16 19:01:05 com.datasophon.worker.actor.RemoteEventActor:[39] - akka.tcp://datasophon@p-mn-01:2552-->akka.tcp://datasophon@p-mn-01:2551 associated [WARN] 2023-05-16 19:01:05 akka.serialization.Serialization(akka://datasophon):[78] - Using the default Java serializer for class [com.datasophon.common.model.StartWorkerMessage] which is not recommended because of performance implications. Use another serializer or disable this warning using the setting 'akka.actor.warn-about-java-serializer-usage' [INFO] 2023-05-16 19:03:53 com.datasophon.common.utils.ShellUtils:[96] - 脚本返回的数据如下: x86_64 [INFO] 2023-05-16 19:03:53 com.datasophon.common.utils.ShellUtils:[169] - stopping node [INFO] 2023-05-16 19:03:56 com.datasophon.common.utils.ShellUtils:[169] - End stop node. [INFO] 2023-05-16 19:03:56 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:03:56 com.datasophon.worker.WorkerApplicationServer:[166] - Worker server stopped

How to reproduce

Follow this step to deploy version 1.1.1

https://datasophon.github.io/datasophon-website/docs/current/%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C/%E5%88%9B%E5%BB%BA%E9%9B%86%E7%BE%A4

Anything else

No response

Version

dev

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

WillJiang12 avatar May 16 '23 11:05 WillJiang12

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

Deployment cluster prompts distribution failure, please check the agent side logs 20230516191249

What you expected to happen

[root@p-mn-01 logs]# cat datasophon-worker.log [INFO] 2023-05-16 19:00:46 com.datasophon.common.utils.ShellUtils:[96] - 脚本返回的数据如下: x86_64 [INFO] 2023-05-16 19:00:46 com.datasophon.common.utils.ShellUtils:[169] - stopping node [INFO] 2023-05-16 19:00:49 com.datasophon.common.utils.ShellUtils:[169] - End stop node. [INFO] 2023-05-16 19:00:49 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:00:49 com.datasophon.worker.WorkerApplicationServer:[166] - Worker server stopped [INFO] 2023-05-16 19:00:54 com.datasophon.common.utils.ShellUtils:[96] - 脚本返回的数据如下: x86_64 [INFO] 2023-05-16 19:00:54 akka.event.slf4j.Slf4jLogger:[92] - Slf4jLogger started [INFO] 2023-05-16 19:00:54 akka.remote.Remoting:[83] - Starting remoting [INFO] 2023-05-16 19:00:55 akka.remote.Remoting:[83] - Remoting started; listening on addresses :[akka.tcp://datasophon@p-mn-01:2552] [INFO] 2023-05-16 19:00:55 akka.remote.Remoting:[83] - Remoting now listens on addresses: [akka.tcp://datasophon@p-mn-01:2552] [INFO] 2023-05-16 19:00:55 com.datasophon.common.utils.ShellUtils:[169] - no node to stop [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - starting node, logging to /opt/datasophon/datasophon-worker/node/x86/logs/node-p-mn-01.out [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - nohup /opt/datasophon/datasophon-worker/node/x86/node_exporter > /opt/datasophon/datasophon-worker/node/x86/logs/node-p-mn-01.out 2>&1 & [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - End restart node. [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1003(hive) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1004(elastic) gid=1004(elastic) 组=1004(elastic) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1005(hdfs) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1006(yarn) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1007(mapred) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - uid=1008(hbase) gid=1003(hadoop) 组=1003(hadoop) [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[169] - usermod:无改变 [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:01:05 com.datasophon.common.utils.ShellUtils:[67] - 脚本返回的数据如下:{coreNum: 12, totalMem: 31.2607, totalDisk: 492.06} [INFO] 2023-05-16 19:01:05 com.datasophon.worker.WorkerApplicationServer:[155] - host info collect result:com.datasophon.common.utils.ExecResult@7749bf93 [INFO] 2023-05-16 19:01:05 com.datasophon.worker.WorkerApplicationServer:[93] - start worker [INFO] 2023-05-16 19:01:05 com.datasophon.worker.actor.RemoteEventActor:[39] - akka.tcp://datasophon@p-mn-01:2552-->akka.tcp://datasophon@p-mn-01:2551 associated [WARN] 2023-05-16 19:01:05 akka.serialization.Serialization(akka://datasophon):[78] - Using the default Java serializer for class [com.datasophon.common.model.StartWorkerMessage] which is not recommended because of performance implications. Use another serializer or disable this warning using the setting 'akka.actor.warn-about-java-serializer-usage' [INFO] 2023-05-16 19:03:53 com.datasophon.common.utils.ShellUtils:[96] - 脚本返回的数据如下: x86_64 [INFO] 2023-05-16 19:03:53 com.datasophon.common.utils.ShellUtils:[169] - stopping node [INFO] 2023-05-16 19:03:56 com.datasophon.common.utils.ShellUtils:[169] - End stop node. [INFO] 2023-05-16 19:03:56 com.datasophon.common.utils.ShellUtils:[145] - script execute success [INFO] 2023-05-16 19:03:56 com.datasophon.worker.WorkerApplicationServer:[166] - Worker server stopped

How to reproduce

Follow this step to deploy version 1.1.1

https://datasophon.github.io/datasophon-website/docs/current/%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C/%E5%88%9B%E5%BB%BA%E9%9B%86%E7%BE%A4

Anything else

No response

Version

dev

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

github-actions[bot] avatar May 16 '23 11:05 github-actions[bot]

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.

  • In order for us to understand your request as soon as possible, please provide detailed information, version or pictures.

github-actions[bot] avatar May 16 '23 11:05 github-actions[bot]

I saw the following error in the log. Please refer to the previous text for a complete log

[INFO] 2023-05-16 19:01:05 com.datasophon.worker.actor.RemoteEventActor:[39] - akka.tcp://datasophon@p-mn-01:2552-->akka.tcp://datasophon@p-mn-01:2551 associated [WARN] 2023-05-16 19:01:05 akka.serialization.Serialization(akka://datasophon):[78] - Using the default Java serializer for class [com.datasophon.common.model.StartWorkerMessage] which is not recommended because of performance implications. Use another serializer or disable this warning using the setting 'akka.actor.warn-about-java-serializer-usage'

WillJiang12 avatar May 16 '23 11:05 WillJiang12

The host name does not match the actual one filled in. Please read the deployment document before proceeding: [INFO] 2023-05-16 19:01:05 com.datasophon.worker.actor.RemoteEventActor:[39] - akka.tcp://datasophon@p-mn-01:2552-->akka.tcp://datasophon@p-mn-01:2551 associated image

datasophon avatar May 16 '23 14:05 datasophon