Stephen

Results 5 comments of Stephen

你好,我现在遇到个问题,是提交不到yarn上。找不到hdfs上的deepfm.pt文件,麻烦帮忙看一下。 脚本配置如下: ``` #!/bin/bash input=hdfs://xxxx-1v/home/yarn/pytorch-on-angel/census_148d_train.libsvm.tmp output=hdfs://xxxx-1v/home/hdp/jia/angel/model/20191231_louvain/ source ./spark-on-angel-env.sh echo "------------------" #JAVA_LIBRARY_PATH=/home/work/software/java/lib JAVA_LIBRARY_PATH=/home/work/software/angel/lib:/home/work/software/java/lib echo $JAVA_LIBRARY_PATH $SPARK_HOME/bin/spark-submit \ --conf spark.ps.instances=5 \ --conf spark.ps.cores=1 \ --conf spark.ps.jars=$SONA_ANGEL_JARS \ --conf spark.ps.memory=5g \ --conf spark.ps.log.level=INFO...

改为yarn-cluster报如下错误了: ![image](https://user-images.githubusercontent.com/30521047/103617918-01c8da00-4f6a-11eb-9a04-3b61f4979d8d.png) ![image](https://user-images.githubusercontent.com/30521047/103617602-7fd8b100-4f69-11eb-833e-98f77f9af8da.png)

torchlib.zip解压开是lib目录,lib下是很多.a文件 ![image](https://user-images.githubusercontent.com/30521047/103631357-d64fea80-4f7d-11eb-819e-8d60d2225f1e.png)

你好,我最后把集群所有节点环境都配置了一下。yarn-client模式就可以用了。但是偶尔会报这个错,是什么原因导致的呢。 ``` 21/01/13 13:07:37 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED) 21/01/13 13:07:38 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED) 21/01/13 13:07:39 INFO Client: Application report for application_1609301285435_0588...

> > 你好,我最后把集群所有节点环境都配置了一下。yarn-client模式就可以用了。但是偶尔会报这个错,是什么原因导致的呢。 > > ``` > > 21/01/13 13:07:37 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED) > > 21/01/13 13:07:38 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED)...