TitanSneaker

Results 1 comments of TitanSneaker

Same problem: ``` CUDA_VISIBLE_DEVICES=0 python llama_inference.py ./llama-hf/llama-7b --load llama7b-4bit-128g.pt --text "this is llama" --wbits 4 --groupsize 128 Loading model ... Found 3 unique KN Linear values. Warming up autotune cache...