GPTQ-for-LLaMa icon indicating copy to clipboard operation
GPTQ-for-LLaMa copied to clipboard

Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

Open chigkim opened this issue 1 year ago • 2 comments

I converted Llama weightsand quantized, but I got this error when I ran the inference. Could someone help me and let me know how I can fix it? Thanks!

Here are the commands I ran.

python convert_llama_weights_to_hf.py --input_dir weights --model_size 7B --output_dir ./llama-hf
python llama.py llama-hf/llama-7b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save llama-hf/llama7b-4bit-128g.pt
python llama_inference.py llama-hf/llama-7b --wbits 4 --groupsize 128 --load llama-hf/llama7b-4bit-128g.pt --text "Once upon a time, " --device=0

Here is the error.

Loading model ...
Found 3 unique KN Linear values.
Warming up autotune cache ...
100% 12/12 [00:55<00:00,  4.60s/it]
Found 1 unique fused mlp KN values.
Warming up autotune cache ...
  0% 0/12 [00:00<?, ?it/s]
python3: /project/lib/Analysis/Allocation.cpp:42: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(const mlir::Attribute&, const mlir::Attribute&): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

chigkim avatar May 01 '23 11:05 chigkim

May be duplicate for #179

edwardzjl avatar May 04 '23 07:05 edwardzjl

Set fused_mlp=False don't work for me……

LuciaIsFine avatar Jul 05 '23 02:07 LuciaIsFine