Loading model ...
Found 3 unique KN Linear values.
Warming up autotune cache ...
100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:34<00:00, 2.85s/it]
Found 1 unique fused mlp KN values.
Warming up autotune cache ...
100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:17<00:00, 1.45s/it]
Done.
Traceback (most recent call last):
File "llama_inference.py", line 120, in
generated_ids = model.generate(
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 2524, in sample
outputs = self(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/luzijia/GPTQ-for-LLaMa-triton/quant/fused_attn.py", line 154, in forward
with torch.backends.cuda.sdp_kernel(enable_math=False):
AttributeError: module 'torch.backends.cuda' has no attribute 'sdp_kernel'
I meet the same error, have you solve it? Is this a problem with the torch version?
I meet the same error, have you solve it? Is this a problem with the torch version?
I got the same error in PyTorch 1.12.1. After I updated to 2.0.1 it's gone