pyllama
pyllama copied to clipboard
Error trying Quantize 7B model to 2-bit
I have installed GPTQ as said "https://pypi.org/project/gptq/#description", but following error comes out after execute python -m llama.llama_quant D:\Repo\Llama\weights\7B c4 --wbits 2 --save pyllama-7B2b.pt
:
Traceback (most recent call last): File "C:\Users\ASUS\.conda\envs\PyLlama\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ASUS\.conda\envs\PyLlama\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Repo\PyLlama\pyllama\llama\llama_quant.py", line 6, in <module> from gptq import ( File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\__init__.py", line 9, in <module> from .gptq import GPTQ File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\gptq.py", line 5, in <module> from .quant import quantize File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\quant.py", line 4, in <module> from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16 ModuleNotFoundError: No module named 'quant_cuda'
I am using Windows 11 SO
I didn't test on windows 11 but it should work if you have a GPU. Can you double check if your gptq installation is completely successful?
Simply uninstall GPTQ completely and then reinstall it to solve this problem.
I didn't test on windows 11 but it should work if you have a GPU. Can you double check if your gptq installation is completely successful?
when i trying Quantize 7B model to 2-bit, got a weird error:
Loading checkpoint shards: 100%|██████████| 33/33 [00:44<00:00, 1.34s/it] Found cached dataset json (C:/Users/TorchYang/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Found cached dataset json (C:/Users/TorchYang/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Traceback (most recent call last): File "D:\SPACE_Research_AI\QutaModel_TransformerBased\Model_quta.py", line 505, in <module> run() File "D:\SPACE_Research_AI\QutaModel_TransformerBased\Model_quta.py", line 460, in run dataloader, testloader = get_loaders( File "D:\Python\Python3_10_8\lib\site-packages\gptq\datautils.py", line 112, in get_loaders return get_c4(nsamples, seed, seqlen, model, tokenizer) File "D:\Python\Python3_10_8\lib\site-packages\gptq\datautils.py", line 67, in get_c4 tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False) File "D:\Python\Python3_10_8\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 676, in from_pretrained raise ValueError( ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
Model_quta.py actually is llama_quant.py in your code.
can you help me ?
Facing the exact same issue as @DirtyKnightForVi
Facing the exact same issue as @DirtyKnightForVi
Follow this link : https://github.com/juncongmoo/pyllama/issues/35 you'd better uninstall transformers and reinstall using ' pip install git+https://github.com/mbehm/transformers '