LLaMA-Factory
LLaMA-Factory copied to clipboard
[PPU]大佬有对ppu环境进行过测试么
Reminder
dpo数据量只有850条。
- [X] I have read the README and searched the existing issues.
System Info
model
model_name_or_path: /mnt/ant-cc/yungui.zs/project/IdentifyRequest/checkpoint/factory_qwen14bchat_data72w+choice7w+aq8.3w_ep4_batch16_sft_full_lr5e5/checkpoint-5400/
method
stage: dpo do_train: true finetuning_type: full lora_target: all pref_beta: 0.1 pref_loss: simpo
dataset
dataset: train_dpo_llama_factory dataset_dir: /dpo_data_llama_factory_data/ template: qwen cutoff_len: 1024 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16
output
output_dir: model_output_path logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 8 gradient_accumulation_steps: 1 learning_rate: 5.0e-6 num_train_epochs: 100 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ddp_timeout: 180000000
eval
val_size: 0.05 per_device_eval_batch_size: 1 eval_strategy: epoch eval_steps: 500
Reproduction
我这边工作环境用的显卡是ppu类型,之前使用llama_factory还是可以正常使用的(运行train_bash.py的方式), 最近在换成llamafactory-cli的方式进行dpo训练,发现ppu利用率为0,模型默认跑到cpu里训练了。 大佬,有在ppu上面做过应用么?想找一个适合在ppu环境依赖库,有什么好方法么?🙏
Expected behavior
No response
Others
No response