DeepSeek-Coder
DeepSeek-Coder copied to clipboard
how to finetune in single gpu
cd finetune && deepspeed finetune_deepseekcoder.py --model_name_or_path $MODEL_PATH --data_path $DATA_PATH --output_dir $OUTPUT_PATH --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "steps" --save_steps 100 --save_total_limit 100 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 1 --lr_scheduler_type "cosine" --gradient_checkpointing True --report_to "tensorboard" --deepspeed configs/ds_config_zero3.json --bf16 True
[2023-12-19 16:10:57,887] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/admin/miniconda3/envs/deepseek/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
[2023-12-19 16:11:06,596] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-19 16:11:06,596] [INFO] [runner.py:570:main] cmd = /home/admin/miniconda3/envs/deepseek/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None finetune_deepseekcoder.py --model_name_or_path deepseek-ai/deepseek-coder-6.7b-instruct --data_path ../data/nickroshEvol-Instruct-Code-80k-v1/EvolInstruct-Code-80k.json --output_dir ./outputs --num_train_epochs 3 --model_max_length 1024 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 100 --learning_rate 2e-5 --warmup_steps 10 --logging_steps 1 --lr_scheduler_type cosine --gradient_checkpointing True --report_to tensorboard --deepspeed configs/ds_config_zero3.json --bf16 True
[2023-12-19 16:11:12,734] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/admin/miniconda3/envs/deepseek/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
[2023-12-19 16:11:16,782] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-12-19 16:11:16,782] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-12-19 16:11:16,782] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-12-19 16:11:16,782] [INFO] [launch.py:163:main] dist_world_size=1
[2023-12-19 16:11:16,782] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
/home/admin/miniconda3/envs/deepseek/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
[2023-12-19 16:11:28,688] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-19 16:11:30,064] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-12-19 16:11:30,065] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Traceback (most recent call last):
File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 193, in
It seems your environment has no gpu device.