MOSS
MOSS copied to clipboard
经过几番捣腾,后台报语法错误:TypeError: '<' not supported between instances of 'tuple' and 'float'`
经过几番捣腾,载入int4的模型OK了,浏览器提交prompt,后台报语法错误如下。 ubuntu:2204 NVIDIA-SMI 530.41.03
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
`~/MOSS$ python moss_gui_demo.py Waiting for all devices to be ready, it may take a few minutes...
Running on local URL: http://0.0.0.0:6006 Running on public URL: https://7b29d06f6fba682b.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last): File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/gradio/blocks.py", line 1025, in call_function prediction = await anyio.to_thread.run_sync( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "moss_gui_demo.py", line 122, in predict outputs = model.generate( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/transformers/generation/utils.py", line 1571, in generate return self.sample( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/transformers/generation/utils.py", line 2534, in sample outputs = self( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/good/MOSS/models/modeling_moss.py", line 678, in forward transformer_outputs = self.transformer( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/good/MOSS/models/modeling_moss.py", line 545, in forward outputs = block( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/good/MOSS/models/modeling_moss.py", line 270, in forward attn_outputs = self.attn( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/good/MOSS/models/modeling_moss.py", line 164, in forward qkv = self.qkv_proj(hidden_states) File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/good/MOSS/models/quantization.py", line 371, in forward out = QuantLinearFunction.apply(x.reshape(-1, x.shape[-1]), self.qweight, self.scales, File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 94, in decorate_fwd return fwd(*args, **kwargs) File "/home/good/MOSS/models/quantization.py", line 283, in forward output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq) File "/home/good/MOSS/models/quantization.py", line 254, in matmul248 matmul_248_kernel[grid](input, qweight, output, File "/home/good/MOSS/models/custom_autotune.py", line 93, in run self.cache[key] = builtins.min(timings, key=timings.get) TypeError: '<' not supported between instances of 'tuple' and 'float'`
same issue.
一样的错误。
请查看这个issue https://github.com/OpenLMLab/MOSS/issues/65
参考:ssue https://github.com/OpenLMLab/MOSS/issues/65 注释掉 models/custom_autotune.py 后依然报下面错误:
except #triton.compiler.OutOfResources: return float('inf')
$ python3 moss_cli_demo.py
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Waiting for all devices to be ready, it may take a few minutes...
triton not installed. Run pip install triton
to load quantized version of MOSS.
Traceback (most recent call last):
File "moss_cli_demo.py", line 30, in
已更新
已更新
还是有一样的问题
我也有一样的问题
我也有一样的问题
https://github.com/OpenLMLab/MOSS/issues/129#issuecomment-1535899953 我通过这个大佬的解决方案搞好了。