simple-llm-finetuner
simple-llm-finetuner copied to clipboard
Traceback during inference.
Colab, gives the following error during inference:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1075, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.9/dist-packages/gradio/helpers.py", line 587, in tracked_fn
response = fn(*args)
File "/content/simple-llama-finetuner/main.py", line 121, in generate_text
generation_output = model.generate(
File "/usr/local/lib/python3.9/dist-packages/peft/peft_model.py", line 581, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 1451, in generate
return self.sample(
File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 2467, in sample
outputs = self(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 765, in forward
outputs = self.model(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 614, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 309, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 209, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/peft/tuners/lora.py", line 522, in forward
result = super().forward(x)
File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/autograd/_functions.py", line 317, in forward
state.CxB, state.SB = F.transform(state.CB, to_order=formatB)
File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/functional.py", line 1698, in transform
prev_device = pre_call(A.device)
AttributeError: 'NoneType' object has no attribute 'device'
Thanks for the report. Looks like my janky model loading/reloading isn't working. Fixing.
Should be fixed in latest master!
@lxe I just did a git clone to try out your fix and I still get this on colab :(
I just tried on a fresh Tesla T4 colab and unable to reproduce. Using the vanilla model with no LoRA.
Just reproduced! Will fix!
https://github.com/tloen/alpaca-lora/issues/21
I would try again on colab
@lxe I just deleted my runtime to get a fresh environment, saw git clone pull a new copy of the code, and I still get the error on a GPU environment on Colab.
Error:
Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/usr/local/lib/python3.9/dist-packages/gradio/helpers.py", line 587, in tracked_fn response = fn(*args) File "/content/simple-llama-finetuner/main.py", line 104, in generate_text output = model.generate( # type: ignore File "/usr/local/lib/python3.9/dist-packages/peft/peft_model.py", line 581, in generate outputs = self.base_model.generate(**kwargs) File "/usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 1462, in generate return self.sample( File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 2478, in sample outputs = self( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 705, in forward outputs = self.model( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 593, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 311, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/transformers/models/llama/modeling_llama.py", line 212, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/peft/tuners/lora.py", line 522, in forward result = super().forward(x) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/autograd/_functions.py", line 317, in forward state.CxB, state.SB = F.transform(state.CB, to_order=formatB) File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/functional.py", line 1698, in transform prev_device = pre_call(A.device) AttributeError: 'NoneType' object has no attribute 'device'