GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
I converted Llama weightsand quantized, but I got this error when I ran the inference. Could someone help me and let me know how I can fix it? Thanks!
Here are the commands I ran.
python convert_llama_weights_to_hf.py --input_dir weights --model_size 7B --output_dir ./llama-hf
python llama.py llama-hf/llama-7b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save llama-hf/llama7b-4bit-128g.pt
python llama_inference.py llama-hf/llama-7b --wbits 4 --groupsize 128 --load llama-hf/llama7b-4bit-128g.pt --text "Once upon a time, " --device=0
Here is the error.
Loading model ...
Found 3 unique KN Linear values.
Warming up autotune cache ...
100% 12/12 [00:55<00:00, 4.60s/it]
Found 1 unique fused mlp KN values.
Warming up autotune cache ...
0% 0/12 [00:00<?, ?it/s]
python3: /project/lib/Analysis/Allocation.cpp:42: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(const mlir::Attribute&, const mlir::Attribute&): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
May be duplicate for #179
Set fused_mlp=False don't work for me……