heting-bes
Results
1
comments of
heting-bes
直觉是修改examples/pytorch/nanogpt/elastic_job.yaml: command: - /bin/bash - -c - "dlrover-run --network-check --nnodes=$NODE_NUM \ --nproc_per_node=1 --max_restarts=1 \ ./examples/pytorch/nanogpt/train.py \ --data_dir /data/nanogpt/" 改为如下形式,报错:找不到 llamafactory-cli这个文件,也即是必须后面需要跟train.py文件? command: - /bin/bash - -c - "dlrover-run --network-check --nnodes=$NODE_NUM \...