intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
An error occurred during DPO on NVIDIA GPU
I have changed some parameters in the training code as instructed, but when I run dpo on 8*A6000, I get these errors. If I understand correctly, habana is only used for hpu training.
Details
Traceback (most recent call last):
File "/data1/yoyo/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/examples/finetuning/dpo_pipeline/dpo_clm.py", line 219, in
Details
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python dpo_clm.py \ --model_name_or_path "/data1/yoyo/intel-extension-for-transformers/data/Mistral-7B-v0.1" \ --output_dir "/data1/yoyo/intel-extension-for-transformers/out/dpo_test" \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 8 \ --learning_rate 5e-4 \ --max_steps 1000 \ --save_steps 10 \ --lora_alpha 16 \ --lora_rank 16 \ --lora_dropout 0.05 \ --dataset_name Intel/orca_dpo_pairs \ --bf16 \ --use_auth_token True \ --use_habana False \ --use_lazy_mode False \ --device "auto"
Also, when I run sft(finetune_neuralchat_v3.py), accelerate is automatically set to cpu
Details
[INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cpu (auto detect)
"No device has been set. Use either --use_habana to run on HPU or --no_cuda to run on CPU."
Operating system: CentOS 7 Python: 3.10 torch: 2.1.0 CUDA: 12.2 optimum-habana: 1.9.0 transformers: 4.34.1 accelerate: 0.25.0
hi,
- For nvidia gpu, you don't need install optimum-habana, because the code will check 'is_optimum_habana_available()' for habana device. So you can uninstall this package and don't need set "--use_habana" and "--use_lazy_mode ".
- The "DPOTrainer" inherits from huggingface/transformers "Trainer", so the device setting is same with it. if the environment has gpu, the code would check this and use it. If set "--use_cpu", the code will run on cpu.
Thanks~
Hi, I will close this issue if you don't have concerns