dlrover icon indicating copy to clipboard operation
dlrover copied to clipboard

Question: How DLRover integrate with Llama Factory?

Open heting-bes opened this issue 1 year ago • 1 comments

heting-bes avatar Aug 21 '24 06:08 heting-bes

直觉是修改examples/pytorch/nanogpt/elastic_job.yaml:

command: - /bin/bash - -c - "dlrover-run --network-check --nnodes=$NODE_NUM
--nproc_per_node=1 --max_restarts=1
./examples/pytorch/nanogpt/train.py
--data_dir /data/nanogpt/"

改为如下形式,报错:找不到 llamafactory-cli这个文件,也即是必须后面需要跟train.py文件?

command: - /bin/bash - -c - "dlrover-run --network-check --nnodes=$NODE_NUM
--nproc_per_node=1 --max_restarts=1
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml"

heting-bes avatar Aug 21 '24 06:08 heting-bes

U should encapsulate the usage of your CLI within your training script.

BalaBalaYi avatar Nov 27 '24 09:11 BalaBalaYi

This issue has been automatically marked as stale because it has not had recent activity.

github-actions[bot] avatar Feb 26 '25 01:02 github-actions[bot]

This issue is being automatically closed due to inactivity.

github-actions[bot] avatar Mar 05 '25 01:03 github-actions[bot]