XLearning icon indicating copy to clipboard operation
XLearning copied to clipboard

AI on Hadoop

Results 44 XLearning issues
Sort by recently updated
recently updated
newest added

Application application_1541471478754_0001 failed 2 times due to AM Container for appattempt_1541471478754_0001_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2018-11-06 10:32:36.870]Exception from container-launch. Container id: container_1541471478754_0001_02_000001 Exit code: 1 [2018-11-06 10:32:36.878]Container...

请问下Xlearning支持tensorflow2.0吗?如果不支持,您知道还有什么调度平台支持tf2.0吗?

![image](https://user-images.githubusercontent.com/56247897/126369990-a4db7982-95f2-46cd-91f8-1e8120703c2a.png) cluster = json.loads(os.environ["TF_CLUSTER_DEF"]) 如下: {'ps': ['hadoop00:24383'], 'worker': ['hadoop01:22676', 'hdoop02:27181']} 其中一个TF_CONFIG {"cluster": {"worker": ["hadoop00:22676", "hadoop002:27181 ", "hadoop03:24383"]}, "task": {"type": "worker", "index": 0}} 求解决,折腾一周多了

![image](https://user-images.githubusercontent.com/56247897/126117493-fd38b84e-0e94-4e51-a5c9-cd263ffb1fff.png)

我运行 examples/tensorflow/run.sh 后一直显示 Application report for application_1612431077961_007(state:ACCEPTED) 运行了一天多也还是这样,这是正常还是不正常呢?

https://github.com/Qihoo360/XLearning/blob/749b8a9e90140f0825709b71ffba128e9e55b098/pom.xml#L13 CVE-2017-15718 CVE-2018-1296 Recommended upgrade version:2.10.0

请教一下: Xlearning 1.1 版本,跑 TensorFlow的demo,日志中显示所有的work都已经训练完毕了,但是只有task_index = 0 的container状态更新为success,其他container一直在running,日志中没有任何输出? 另外,问一下 ,无论worker-num,设置多少个,都是在一台机器上起的吗?

请问,支持hadoop3.1么?