Ouyang Wen
Ouyang Wen
看一下ps的gc日志页面,有频繁的full gc日志吗
> 21/11/24 17:14:47 INFO RunningContext: =====================Server running context start======================= 21/11/24 17:14:47 INFO RunningContext: state = IDLE 21/11/24 17:14:47 INFO RunningContext: totalRunningRPCCounter = 1 21/11/24 17:14:47 INFO RunningContext: infligtingRPCCounter = 0...
你现在增加ps内存可以运行成功吗
> angel.netty.matrixtransfer.max.message.size angel.netty.matrixtransfer.max.message.size,这个参数这么用--conf spark.hadoop.angel.netty.matrixtransfer.max.message.size=
 另外,ps分区数你调小到80,batchsize调小到50 重新提交下
> 上面报错的启动脚本是: source ./bin/spark-on-angel-env.sh $SPARK_HOME/bin/spark-submit --master yarn-cluster --conf spark.ps.instances=8 --conf spark.ps.cores=4 --conf spark.ps.jars=$SONA_ANGEL_JARS --conf spark.ps.memory=10g --jars $SONA_SPARK_JARS --driver-memory 30g --num-executors 8 --verbose --executor-cores 4 --executor-memory 15g --conf spark.default.parallelism=5000 --conf spark.hadoop.angel.netty.matrixtransfer.max.message.size=1073741824...
 看一下angel master日志有没有错误信息,看着超时挂了
再看一下spark executor日志,看看一个batch拉取的耗时多少
> 是下面的这个日志么: 21/11/26 09:11:51 INFO TaskSetManager: Finished task 45.0 in stage 0.0 (TID 45) in 17699 ms on (executor 7) (42/56) 21/11/26 09:11:51 INFO TaskSetManager: Finished task 33.0 in stage...
you should use spark-submit option:--principal, --keytab while use SONA.