simple-llm-finetuner
simple-llm-finetuner copied to clipboard
Getting OOM
Training on T4:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 14.56 GiB total capacity; 13.25 GiB already allocated; 10.44 MiB free; 13.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I suspect a change of verisons in peft or transformers ... Does it make sense ?
Same. This didn't use to happen.
{'loss': 1.1006, 'learning_rate': 2.748091603053435e-05, 'epoch': 0.92}
{'train_runtime': 350.2814, 'train_samples_per_second': 0.374, 'train_steps_per_second': 0.374, 'train_loss': 1.0609159615203625, 'epoch': 1.0}
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 395, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1193, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 916, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.9/dist-packages/gradio/helpers.py", line 588, in tracked_fn
response = fn(*args)
File "/content/simple-llama-finetuner/main.py", line 253, in tokenize_and_train
model.save_pretrained(output_dir)
File "/usr/local/lib/python3.9/dist-packages/peft/peft_model.py", line 116, in save_pretrained
output_state_dict = get_peft_model_state_dict(
File "/usr/local/lib/python3.9/dist-packages/peft/utils/save_and_load.py", line 32, in get_peft_model_state_dict
state_dict = model.state_dict()
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
[Previous line repeated 4 more times]
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1815, in state_dict
self._save_to_state_dict(destination, prefix, keep_vars)
File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 268, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/autograd/_functions.py", line 96, in undo_layout
outputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 0; 39.56 GiB total capacity; 35.96 GiB already allocated; 4.56 MiB free; 37.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Keyboard interruption in main thread... closing server.
My case was the bitsandbytes error.
Referring to the issue below, using the bitsandbytes==0.37.2 version, the problem does not occur.
https://github.com/TimDettmers/bitsandbytes/issues/324