text-generation-webui
text-generation-webui copied to clipboard
Lora Training fails to save checkpoint
Describe the bug
I am able to train but as soon as it tries to save the checkpoint I get the following error. This only occurs on the new installer version with the webui.py, the previous version saves the checkpoints ok.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Install new version of text-generation-webui for windows installer
Screenshot
No response
Logs
Exception in thread Thread-7 (threaded_run):
Traceback (most recent call last):
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\text-generation-webui\modules\training.py", line 416, in threaded_run
trainer.train()
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\transformers\trainer.py", line 1662, in train
return inner_training_loop(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\transformers\trainer.py", line 1918, in _inner_training_loop
self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\transformers\trainer_callback.py", line 369, in on_step_begin
return self.call_event("on_step_begin", args, state, control)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\transformers\trainer_callback.py", line 397, in call_event
result = getattr(callback, event)(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\text-generation-webui\modules\training.py", line 363, in on_step_begin
lora_model.save_pretrained(f"{lora_file_path}/checkpoint-{tracked.current_steps}/")
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\peft\peft_model.py", line 125, in save_pretrained
output_state_dict = get_peft_model_state_dict(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\peft\utils\save_and_load.py", line 32, in get_peft_model_state_dict
state_dict = model.state_dict()
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
[Previous line repeated 4 more times]
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1815, in state_dict
self._save_to_state_dict(destination, prefix, keep_vars)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 268, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 100, in undo_layout
return outputs.reshape(rows, cols).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 12.00 GiB total capacity; 10.64 GiB already allocated; 0 bytes free; 11.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Training complete, saving...
Traceback (most recent call last):
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\gradio\routes.py", line 395, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\gradio\blocks.py", line 1193, in process_api
result = await self.call_function(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\gradio\blocks.py", line 930, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\gradio\utils.py", line 491, in async_iteration
return next(iterator)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\text-generation-webui\modules\training.py", line 452, in do_train
lora_model.save_pretrained(lora_file_path)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\peft\peft_model.py", line 125, in save_pretrained
output_state_dict = get_peft_model_state_dict(
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\peft\utils\save_and_load.py", line 32, in get_peft_model_state_dict
state_dict = model.state_dict()
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1818, in state_dict
module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
[Previous line repeated 4 more times]
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1815, in state_dict
self._save_to_state_dict(destination, prefix, keep_vars)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\bitsandbytes\nn\modules.py", line 268, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "C:\Users\ermar\OneDrive\Desktop\LLM_Folder\installer_files\env\lib\site-packages\bitsandbytes\autograd\_functions.py", line 100, in undo_layout
return outputs.reshape(rows, cols).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 12.00 GiB total capacity; 10.62 GiB already allocated; 0 bytes free; 11.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
System Info
3080Ti , Running Local
Also getting this using 4090
@mcmonkey4eva
See error report @ https://github.com/TimDettmers/bitsandbytes/issues/324
Users previously reported that pip install bitsandbytes==0.37.2
avoids the OOM issue, albeit it's a pain to install on windows
See error report @ TimDettmers/bitsandbytes#324
Users previously reported that
pip install bitsandbytes==0.37.2
avoids the OOM issue, albeit it's a pain to install on windows
On windows native this made things worse for me. On WSL it seems to have resolved the issue.
I also confirm doing pip install bitsandbytes==0.37.2
messes it up for me on Windows, but fixed the problem in WSL.
But I also have to do an additional step in WSL, which is replacing bitsandbytes_cpu.so
with bitsandbytes_cuda117.so
in \my_user_name\miniconda3\envs\my_env_name\lib\python3.10\site-packages\bitsandbytes
. Without this step, I cannot load models in 8bit in order to train LoRA.
See error report @ TimDettmers/bitsandbytes#324
Users previously reported that
pip install bitsandbytes==0.37.2
avoids the OOM issue, albeit it's a pain to install on windows
This helped me, thank you very much!
LORA training fails with a Torch Out Of Memory error at the end of training (10 seconds left, see error message at the end), but if I set bitsandbytes to 0.37.2 even text generation fails. (on Linux, on a RTX3080, 10GB).
Error when LORA Training is supposed to be saved.
text-generation-webui-text-generation-webui-1 | warnings.warn( text-generation-webui-text-generation-webui-1 | cuBLAS API failed with status 15 text-generation-webui-text-generation-webui-1 | A: torch.Size([33, 2560]), B: torch.Size([7680, 2560]), C: (33, 7680); (lda, ldb, ldc): (c_int(1056), c_int(245760), c_int(1056)); (m, n, k): (c_int(33), c_int(7680), c_int(2560)) text-generation-webui-text-generation-webui-1 | Traceback (most recent call last): text-generation-webui-text-generation-webui-1 | File "/app/modules/callbacks.py", line 73, in gentask text-generation-webui-text-generation-webui-1 | ret = self.mfunc(callback=_callback, **self.kwargs) text-generation-webui-text-generation-webui-1 | File "/app/modules/text_generation.py", line 251, in generate_with_callback text-generation-webui-text-generation-webui-1 | shared.model.generate(**kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context text-generation-webui-text-generation-webui-1 | return func(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate text-generation-webui-text-generation-webui-1 | return self.sample( text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample text-generation-webui-text-generation-webui-1 | outputs = self( text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl text-generation-webui-text-generation-webui-1 | return forward_call(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward text-generation-webui-text-generation-webui-1 | output = old_forward(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 662, in forward text-generation-webui-text-generation-webui-1 | outputs = self.gpt_neox( text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl text-generation-webui-text-generation-webui-1 | return forward_call(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward text-generation-webui-text-generation-webui-1 | output = old_forward(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 545, in forward text-generation-webui-text-generation-webui-1 | outputs = torch.utils.checkpoint.checkpoint( text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint text-generation-webui-text-generation-webui-1 | return CheckpointFunction.apply(function, preserve, *args) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply text-generation-webui-text-generation-webui-1 | return super().apply(*args, **kwargs) # type: ignore[misc] text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward text-generation-webui-text-generation-webui-1 | outputs = run_function(*args) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 541, in custom_forward text-generation-webui-text-generation-webui-1 | return module(*inputs, use_cache, None, output_attentions) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl text-generation-webui-text-generation-webui-1 | return forward_call(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward text-generation-webui-text-generation-webui-1 | output = old_forward(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 320, in forward text-generation-webui-text-generation-webui-1 | attention_layer_outputs = self.attention( text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl text-generation-webui-text-generation-webui-1 | return forward_call(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward text-generation-webui-text-generation-webui-1 | output = old_forward(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 116, in forward text-generation-webui-text-generation-webui-1 | qkv = self.query_key_value(hidden_states) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl text-generation-webui-text-generation-webui-1 | return forward_call(*args, **kwargs) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/peft/tuners/lora.py", line 698, in forward text-generation-webui-text-generation-webui-1 | result = super().forward(x) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward text-generation-webui-text-generation-webui-1 | out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul text-generation-webui-text-generation-webui-1 | return MatMul8bitLt.apply(A, B, out, bias, state) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply text-generation-webui-text-generation-webui-1 | return super().apply(*args, **kwargs) # type: ignore[misc] text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 377, in forward text-generation-webui-text-generation-webui-1 | out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB) text-generation-webui-text-generation-webui-1 | File "/app/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1410, in igemmlt text-generation-webui-text-generation-webui-1 | raise Exception('cublasLt ran into an error!') text-generation-webui-text-generation-webui-1 | Exception: cublasLt ran into an error! text-generation-webui-text-generation-webui-1 | error detectedOutput generated in 0.24 seconds (0.00 tokens/s, 0 tokens, context 33, seed 478163328)
Training needs some attention on windows, lora adapters is also very buggy (adding , removing, stacking) . Please give this some priority. Thank you
@hypersniper05 I wish I could do more, but the problem isn't that text-gen-webui doesn't work on windows (it works fine in itself) it's that the upstream libraries we depend on for the internal parts aren't well-tested on windows. BitsAndBytes in particular is a bit infamous at this point for how unstable their windows compat is, and is the core of the issue here. I'll test whether it's possible to bypass b&b entirely. It might be?
Test-ran training on windows. Just ran latest one-click installer, followed the monkeypatch install guide (only thing that was weird was having to shove run_cmd("python -m pip install git+https://github.com/sterlind/GPTQ-for-LLaMa.git@lora_4bit")
into webui.py
to do the install rather than figuring out the 'proper' way to run miniconda pip installs lol).
Ran perfectly, didn't even get a VRAM spike from the save for that matter. Everything 'just worked' for me.
I uninstalled bitsandbytes
and training in 4-bit monkeypatch on windows just, works anyway. As long as you're not actually using 8-bit mode I think you can just get rid of it and be good?
as for Linux : replace bitsandbytes in requirement.txt with bitsandbytes==0.37.0 to make it work! 0.37.2 seems to be buggy.
@sammyf
as for Linux : replace bitsandbytes in requirement.txt with bitsandbytes==0.37.0 to make it work! 0.37.2 seems to be buggy.
Thank you, I've tried bitsandbytes 0.37.2
. And OOM issue is gone.
But may I ask what sort of buggy things did you mean?
(it run alright for me but I wonder if I should switch to 0.37.0
as you suggested)
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.