starcoder
starcoder copied to clipboard
During finetuning (finetuner_starcoder.py), I'm running out of GPU memory during the checkpoint saving step (save_pretrained)
Looks like GPU usage almost doubles during saving (save_pretrained - get_peft_model_state_dict function). Is there a way to avoid this?
stack trace:
File "finetune_starcoder.py", line 343, in <module>
main(args)
File "finetune_starcoder.py", line 332, in main
run_training(args, train_dataset, eval_dataset)
File "finetune_starcoder.py", line 323, in run_training
trainer.train()
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2291, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2348, in _save_checkpoint
self.save_model(output_dir, _internal_call=True)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2830, in save_model
self._save(output_dir)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2873, in _save
state_dict = self.model.state_dict()
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
[Previous line repeated 4 more times]
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1815, in state_dict
self._save_to_state_dict(destination, prefix, keep_vars)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 268, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 100, in undo_layout
return outputs.reshape(rows, cols).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 39.56 GiB total capacity; 36.25 GiB already allocated; 24.56 MiB free; 37.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 1/8000 [00:22<49:33:48, 22.31s/it]```
I am encountering the same issue using 1 A100 GPU 40 GiB for fine tuning.
solved it by using bitsandbytes=0.37.2 version
Facing same issue with 4 A40 48 GB
What's the recommended number and type of GPUs for finetuning this ?
any update?