starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

Fine tuning With SQLcoder-7b

Open bhrt95 opened this issue 1 year ago • 0 comments
trafficstars

I'm new to this area of Language models, in my use case I want to fine tune SQL coder model with spider dataset using this code base as this repo was working for me, while following the instructions given in the readme. I'm able to start training with Starcoder model with ArmelR/stack-exchange-instruction dataset.

I replaced python command with model path and also dataset name !python finetune/finetune.py --model_path="defog/sqlcoder-7b" --dataset_name="spider" --subset="data/finetune" --split="train" --size_valid_set 1000 --streaming --seq_length 1024 --max_steps 1000 --batch_size 1 --input_column_name="question" --output_column_name="query" --gradient_accumulation_steps 16 --learning_rate 1e-4 --lr_scheduler_type="cosine" --num_warmup_steps 100 --weight_decay 0.05 --output_dir="./checkpoints"

I'm facing an issue with attention mask shape while starting training, I know just by changing model path itself I couldn't directly just start training, please provide me some suggestions on starting the training.I'm providing link to my kaggle notebook here to get started. https://www.kaggle.com/code/bhrt16/notebookb5fd138c63

This is the log of the error

/opt/conda/lib/python3.10/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" /opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:691: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. Please use token instead. warnings.warn( tokenizer_config.json: 100%|███████████████████| 915/915 [00:00<00:00, 4.98MB/s] tokenizer.model: 100%|███████████████████████| 493k/493k [00:00<00:00, 1.11MB/s] tokenizer.json: 100%|██████████████████████| 1.80M/1.80M [00:00<00:00, 51.6MB/s] special_tokens_map.json: 100%|████████████████| 72.0/72.0 [00:00<00:00, 448kB/s] /opt/conda/lib/python3.10/site-packages/datasets/load.py:2088: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0. You can remove this warning by passing 'token=<use_auth_token>' instead. warnings.warn( Loading the dataset in streaming mode 100%|████████████████████████████████████████| 400/400 [00:03<00:00, 110.05it/s] The character to token ratio of the dataset is: 3.16 Loading the model /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. Please use token instead. warnings.warn( Loading checkpoint shards: 100%|██████████████████| 2/2 [01:07<00:00, 33.92s/it] /opt/conda/lib/python3.10/site-packages/peft/utils/other.py:141: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( trainable params: 41943040 || all params: 3794014208 || trainable%: 1.1055056122762943 Starting main loop Training... wandb: Currently logged in as: bhrt95. Use wandb login --relogin to force relogin wandb: Tracking run with wandb version 0.16.1 wandb: Run data is saved locally in /kaggle/working/starcoder/wandb/run-20231213_114310-6pzqbs68 wandb: Run wandb offline to turn off syncing. wandb: Syncing run StarCoder-finetuned wandb: ⭐️ View project at https://wandb.ai/bhrt95/huggingface wandb: 🚀 View run at https://wandb.ai/bhrt95/huggingface/runs/6pzqbs68 /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( Traceback (most recent call last): File "/kaggle/working/starcoder/finetune/finetune.py", line 326, in main(args) File "/kaggle/working/starcoder/finetune/finetune.py", line 315, in main run_training(args, train_dataset, eval_dataset) File "/kaggle/working/starcoder/finetune/finetune.py", line 306, in run_training trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1540, in train return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1857, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step self.accelerator.backward(loss) File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1905, in backward loss.backward(**kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply return user_fn(self, *args) File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 271, in backward outputs = ctx.run_function(*detached_inputs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 654, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 293, in forward raise ValueError( ValueError: Attention mask should be of size (1, 1, 1024, 2048), but is torch.Size([1, 1, 1024, 1024])

bhrt95 avatar Dec 13 '23 11:12 bhrt95