ZeroDivisionError when loading LoRA
Describe the bug
Running windows (not WSL), loading the alpaca lora causes it to error. I've setup accelerate with its config.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Running on windows, miniconda, start the server.py, Select the alpaca lora. Error.
Screenshot
No response
Logs
Traceback (most recent call last):
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\gradio\routes.py", line 374, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\gradio\blocks.py", line 1017, in process_api
result = await self.call_function(
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\gradio\blocks.py", line 835, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\textgenerators\text-generation-webui\server.py", line 73, in load_lora_wrapper
add_lora_to_model(selected_lora)
File "C:\textgenerators\text-generation-webui\modules\LoRA.py", line 22, in add_lora_to_model
shared.model = PeftModel.from_pretrained(shared.model, Path(f"loras/{lora_name}"), **params)
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\peft\peft_model.py", line 167, in from_pretrained
max_memory = get_balanced_memory(
File "C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\accelerate\utils\modeling.py", line 454, in get_balanced_memory
per_gpu = module_sizes[""] // (num_devices - 1 if low_zero else num_devices)
ZeroDivisionError: integer division or modulo by zero
System Info
Windows 10
RTX 2070
Now that a new commit has fixed WSL (for me to run it), I can now say that I get the same error in WSL as well.
Ok, so if I add return max_memory to line 450 of accelerate's modeling.py.
I am able to avoid this error, though max_memory isn't exactly "get_balanced_memory".
It also messes with cpu mode from what I can tell and makes loading the lora take forever.
A better solution is probably advised.
I edited my original comment to omit it failing to generate after said fix, as I believe it is a seperate (maybe related) issue regarding generation with the GPU at #412