DeepSeek-Coder copied to clipboard
how to finetune in single gpu
cd finetune && deepspeed --model_name_or_path $MODEL_PATH --data_path $DATA_PATH --output_dir $OUTPUT_PATH --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "steps" --save_steps 100 --save_total_limit 100 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 1 --lr_scheduler_type "cosine" --gradient_checkpointing True --report_to "tensorboard" --deepspeed configs/ds_config_zero3.json --bf16 True
[2023-12-19 16:10:57,887] [INFO] [] Setting ds_accelerator to cuda (auto detect)
/home/admin/miniconda3/envs/deepseek/lib/python3.10/site-packages/torch/cuda/ UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: Alternatively, go to: to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
[2023-12-19 16:11:06,596] [WARNING] [] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-19 16:11:06,596] [INFO] [] cmd = /home/admin/miniconda3/envs/deepseek/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr= --master_port=29500 --enable_each_rank_log=None --model_name_or_path deepseek-ai/deepseek-coder-6.7b-instruct --data_path ../data/nickroshEvol-Instruct-Code-80k-v1/EvolInstruct-Code-80k.json --output_dir ./outputs --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 100 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 1 --lr_scheduler_type cosine --gradient_checkpointing True --report_to tensorboard --deepspeed configs/ds_config_zero3.json --bf16 True
[2023-12-19 16:11:12,734] [INFO] [] Setting ds_accelerator to cuda (auto detect)
/home/admin/miniconda3/envs/deepseek/lib/python3.10/site-packages/torch/cuda/ UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: Alternatively, go to: to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
[2023-12-19 16:11:16,782] [INFO] [] WORLD INFO DICT: {'localhost': [0]}
[2023-12-19 16:11:16,782] [INFO] [] nnodes=1, num_local_procs=1, node_rank=0
[2023-12-19 16:11:16,782] [INFO] [] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-12-19 16:11:16,782] [INFO] [] dist_world_size=1
[2023-12-19 16:11:16,782] [INFO] [] Setting CUDA_VISIBLE_DEVICES=0
/home/admin/miniconda3/envs/deepseek/lib/python3.10/site-packages/torch/cuda/ UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: Alternatively, go to: to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
[2023-12-19 16:11:28,688] [INFO] [] Setting ds_accelerator to cuda (auto detect)
[2023-12-19 16:11:30,064] [INFO] [] cdb=None
[2023-12-19 16:11:30,065] [INFO] [] Initializing TorchBackend in DeepSpeed with backend nccl
Traceback (most recent call last):
File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/DeepSeek-Coder/finetune/", line 193, in
It seems your environment has no gpu device.