Stephen
Stephen
你好,我现在遇到个问题,是提交不到yarn上。找不到hdfs上的deepfm.pt文件,麻烦帮忙看一下。 脚本配置如下: ``` #!/bin/bash input=hdfs://xxxx-1v/home/yarn/pytorch-on-angel/census_148d_train.libsvm.tmp output=hdfs://xxxx-1v/home/hdp/jia/angel/model/20191231_louvain/ source ./spark-on-angel-env.sh echo "------------------" #JAVA_LIBRARY_PATH=/home/work/software/java/lib JAVA_LIBRARY_PATH=/home/work/software/angel/lib:/home/work/software/java/lib echo $JAVA_LIBRARY_PATH $SPARK_HOME/bin/spark-submit \ --conf spark.ps.instances=5 \ --conf spark.ps.cores=1 \ --conf spark.ps.jars=$SONA_ANGEL_JARS \ --conf spark.ps.memory=5g \ --conf spark.ps.log.level=INFO...
改为yarn-cluster报如下错误了:  
torchlib.zip解压开是lib目录,lib下是很多.a文件 
你好,我最后把集群所有节点环境都配置了一下。yarn-client模式就可以用了。但是偶尔会报这个错,是什么原因导致的呢。 ``` 21/01/13 13:07:37 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED) 21/01/13 13:07:38 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED) 21/01/13 13:07:39 INFO Client: Application report for application_1609301285435_0588...
> > 你好,我最后把集群所有节点环境都配置了一下。yarn-client模式就可以用了。但是偶尔会报这个错,是什么原因导致的呢。 > > ``` > > 21/01/13 13:07:37 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED) > > 21/01/13 13:07:38 INFO Client: Application report for application_1609301285435_0588 (state: ACCEPTED)...