starcoder During finetuning (finetuner_starcoder.py), I'm running out of GPU memory during the checkpoint saving step (save

trafficstars

Looks like GPU usage almost doubles during saving (save_pretrained - get_peft_model_state_dict function). Is there a way to avoid this?

stack trace:

  File "finetune_starcoder.py", line 343, in <module>
    main(args)
  File "finetune_starcoder.py", line 332, in main
    run_training(args, train_dataset, eval_dataset)
  File "finetune_starcoder.py", line 323, in run_training
    trainer.train()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2291, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2348, in _save_checkpoint
    self.save_model(output_dir, _internal_call=True)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2830, in save_model
    self._save(output_dir)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2873, in _save
    state_dict = self.model.state_dict()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  [Previous line repeated 4 more times]
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1815, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 268, in _save_to_state_dict
    self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 100, in undo_layout
    return outputs.reshape(rows, cols).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 39.56 GiB total capacity; 36.25 GiB already allocated; 24.56 MiB free; 37.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|                                                                                                                                                                                    | 1/8000 [00:22<49:33:48, 22.31s/it]```