recommenders-addons icon indicating copy to clipboard operation
recommenders-addons copied to clipboard

keras-horovod运行报错

Open lixiang-repo opened this issue 1 year ago • 1 comments

System information

  • OS Platform and Distribution (e.g., Linux fedora 2023):
  • TensorFlow version and how it was installed (source or binary): pip3 v2.15.1
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): pip 0.7.0
  • Python version: 3.10.14
  • Is GPU used? (yes/no):no

Describe the bug 运行下面能正常训练 horovodrun -np 1 python test.py --mode="train" --model_dir="./model_dir" --export_dir="./export_dir" 但是-np改成2个以上就会报错

Other info / logs log.txt

lixiang-repo avatar Sep 16 '24 09:09 lixiang-repo

从报错信息上看可以检查一下是不是内存不足被系统killed

MoFHeka avatar Nov 15 '24 23:11 MoFHeka