DHS-LLM-Workshop
DHS-LLM-Workshop copied to clipboard
Fine Tuning with LoRA failed during train step
Below is the notebook link from your blog - https://huggingface.co/blog/personal-copilot https://colab.research.google.com/drive/1Tz9KKgacppA4S6H4eo_sw43qEaC9lFLs?usp=sharing
!git pull
!python train.py \
--model_name_or_path "bigcode/starcoder" \
--dataset_name "smangrul/hf-stack-v1" \
--subset "data" \
--data_column "content" \
--splits "train" \
--seq_length 2048 \
--max_steps 2000 \
--batch_size 4 \
--gradient_accumulation_steps 4 \
--learning_rate 5e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 0.01 \
--num_warmup_steps 30 \
--eval_freq 100 \
--save_freq 100 \
--log_freq 25 \
--num_workers 4 \
--bf16 \
--no_fp16 \
--output_dir "peft-lora-starcoder15B-v2-personal-copilot-A100-40GB-colab" \
--fim_rate 0.5 \
--fim_spm_rate 0.5 \
--use_peft_lora \
--lora_r 32 \
--lora_alpha 64 \
--lora_dropout 0.0 \
--lora_target_modules "c_proj,c_attn,q_attn,c_fc,c_proj" \
--use_flash_attn \
--use_4bit_qunatization \
--use_nested_quant \
--bnb_4bit_compute_dtype "bfloat16"
I am stuck at this step.
Below is the error
Already up to date.
2024-05-09 20:44:58.617684: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-09 20:44:58.617733: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-09 20:44:58.619695: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-09 20:44:58.630452: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-09 20:45:00.111432: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/content/DHS-LLM-Workshop/personal_copilot/training/train.py", line 494, in <module>
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/usr/local/lib/python3.10/dist-packages/transformers/hf_argparser.py", line 348, in parse_args_into_dataclasses
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--subset', 'data', '--data_column', 'content', '--seq_length', '2048', '--batch_size', '4', '--num_warmup_steps', '30', '--eval_freq', '100', '--save_freq', '100', '--log_freq', '25', '--num_workers', '4', '--no_fp16', '--use_4bit_qunatization']