text-generation-webui GPT4 x Alpaca

GPT4 x Alpaca

Open Ege-P opened this issue 1 year ago • 9 comments

Describe the bug

Error using the model: gpt-x-alpaca-13b-native-4bit-128g-cuda.pt

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Clone repository from "https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g" (in models path) Download gpt-x-alpaca-13b-native-4bit-128g-cuda.pt into the file (8 gb) Start webUI (Need to start with arguments: --wbits 4 --groupsize 128) or another error occurs (no file found named: pytorch_model.bin error)

The UI opens but when I try to write something, it gots deleted and an error occurs in the command window.

Screenshot

No response

Logs

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "D:\AIChatUI\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\AIChatUI\text-generation-webui\modules\text_generation.py", line 220, in generate_with_callback
    shared.model.generate(**kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "D:\AIChatUI\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\AIChatUI\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x.float(), self.qweight, out, self.scales.float(), zeros.float(), self.g_idx)
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: int) -> None

Invoked with: tensor([[ 0.0120, -0.0418,  0.2859,  ..., -0.0147,  0.0010,  0.0085],
        [ 0.0258,  0.0073,  0.0054,  ...,  0.0151, -0.0106,  0.0096],
        [-0.0103,  0.0106,  0.0076,  ...,  0.0028,  0.0162,  0.0219],
        ...,
        [ 0.0081,  0.0318,  0.0299,  ..., -0.0010,  0.0168,  0.0154],
        [ 0.0273, -0.0054,  0.0301,  ..., -0.0171,  0.0634,  0.0044],
        [-0.0064, -0.0233, -0.0635,  ...,  0.0368, -0.0215,  0.0078]],
       device='cuda:0'), tensor([[-1398026309,  1248439994,  1968657271,  ...,  1648788836,
          1503146616,  1432982596],
        [-1129530164, -1402222200,  1685349974,  ...,  2016756323,
           900172105, -2007726747],
        [ -876888900, -1735723399,  1717986149,  ..., -1236974524,
          1117231658, -1988663128],
        ...,
        [ 2040244922,   442721970, -1501410730,  ..., -1466332823,
          1110137158,  -878212453],
        [-1196906615,  2052409206,  1768056949,  ...,  2126071976,
          1109693461,  -611755894],
        [ 1735818137, -1669052488, -1469479036,  ...,  1616880563,
          1484147029,  -931563094]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0145, 0.0080, 0.0104,  ..., 0.0184, 0.0109, 0.0129],
        [0.0069, 0.0054, 0.0059,  ..., 0.0165, 0.0122, 0.0090],
        [0.0114, 0.0084, 0.0121,  ..., 0.0273, 0.0139, 0.0107],
        ...,
        [0.0121, 0.0058, 0.0124,  ..., 0.0211, 0.0107, 0.0137],
        [0.0138, 0.0066, 0.0162,  ..., 0.0180, 0.0140, 0.0112],
        [0.0087, 0.0062, 0.0055,  ..., 0.0152, 0.0119, 0.0112]],
       device='cuda:0'), tensor([[0.1592, 0.0716, 0.0623,  ..., 0.0922, 0.0763, 0.0777],
        [0.0621, 0.0433, 0.0529,  ..., 0.1486, 0.0730, 0.0538],
        [0.0568, 0.0923, 0.1205,  ..., 0.1641, 0.1248, 0.0749],
        ...,
        [0.1088, 0.0405, 0.0622,  ..., 0.2322, 0.0753, 0.1096],
        [0.0551, 0.0530, 0.1946,  ..., 0.1259, 0.0983, 0.0893],
        [0.0787, 0.0436, 0.0327,  ..., 0.1062, 0.0717, 0.0900]],
       device='cuda:0'), tensor([ 0,  0,  0,  ..., 39, 39, 39], device='cuda:0', dtype=torch.int32)
Output generated in 1.03 seconds (0.00 tokens/s, 0 tokens, context 43)

System Info

Windows 10
GEFORCE RTX 4090 GIGABYTE

Apr 17 '23 22:04 Ege-P

don't quote me on this but I think either your groupsize is off or you don't have gptq setup right... I think...

Apr 17 '23 23:04 practical-dreamer

Does anyone can run this model with oobabooga now? I saw a note on huggingface model card that says the model is currently incompatible with Oobabooga? Or...am I wrong?

I have the same issue. using windows 10 and a 24gb vram card.

Apr 18 '23 10:04 shawhu

It runs fine on my system. GPU RTX 3060 12GB 16GB RAM CPU Ryzen 5 5600G

Apr 18 '23 11:04 kuso-ge

Fine on mine (4090RTX, 96GB RAM, i9-10900K)- need to delete one of the PT model files (the non-Cuda one)

Apr 18 '23 22:04 desva

gpt4-x-alpaca-13b-native-4bit-128g 19 04 2023 01_35_05

These are the files I have. And I still need a solution...

Apr 18 '23 22:04 Ege-P

Runs fine on my 2080 ti system, aside from lacking the VRAM to really push the context length, but has this error on my M40.

Apr 19 '23 08:04 Pathos14489

Fine on mine (4090RTX, 96GB RAM, i9-10900K)- need to delete one of the PT model files (the non-Cuda one)

Trying almost the same environment, however, it keeps attempting to load using RAM instead of VRAM.

Apr 19 '23 21:04 Ty-lerCox

Same problem with all .pt models:

Invoked with: tensor([[ 0.0097, -0.0423, 0.2747, ..., -0.0144, 0.0021, 0.0083], [ 0.0030, 0.0149, -0.0147, ..., 0.0014, -0.0061, -0.0047], [-0.0071, 0.0223, -0.0016, ..., -0.0235, 0.0400, -0.0051], ..., [-0.0144, -0.0331, 0.0077, ..., 0.0066, 0.0400, -0.0034], [-0.0100, 0.0279, 0.0171, ..., 0.0138, -0.0362, -0.0109], [-0.0053, -0.0219, -0.0596, ..., 0.0373, -0.0200, 0.0070]], device='cuda:0'), tensor([[-1398026309, 1248436154, 1968657271, ..., 1648788836, 1503146616, 1432982596], [-1146307380, -1418999416, 1702123094, ..., 2016756323, 631736649, -2007726747], [ -876888644, -1735723655, 1449550693, ..., -1236974524, 1116183082, -1988663128], ..., [-1634813764, 730166963, -1570613979, ..., -1448437126, 1126914374, -610817348], [-1719031703, -1418118713, 928405381, ..., -1395955303, 1094030373, -895981414], [ 1502051209, -1096394793, -1219852926, ..., 1887417526, 1484150852, -914646341]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0., ..., 0., 0., 0.],

Apr 21 '23 04:04 CRCODE22

I am receiving the same arror with this model: ethzanalytics/RedPajama-INCITE-Chat-3B-v1-GPTQ-4bit-128g

May 07 '23 21:05 bubbabug

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Aug 30 '23 23:08 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

GPT4 x Alpaca

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard