Question: How DLRover integrate with Llama Factory?
直觉是修改examples/pytorch/nanogpt/elastic_job.yaml:
command:
- /bin/bash
- -c
- "dlrover-run --network-check --nnodes=$NODE_NUM
--nproc_per_node=1 --max_restarts=1
./examples/pytorch/nanogpt/train.py
--data_dir /data/nanogpt/"
改为如下形式,报错:找不到 llamafactory-cli这个文件,也即是必须后面需要跟train.py文件?
command:
- /bin/bash
- -c
- "dlrover-run --network-check --nnodes=$NODE_NUM
--nproc_per_node=1 --max_restarts=1
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml"
U should encapsulate the usage of your CLI within your training script.
This issue has been automatically marked as stale because it has not had recent activity.
This issue is being automatically closed due to inactivity.