TitanSneaker comments

Repositories
Issues
Comments

Results 1 comments of


                                            TitanSneaker

Fused mlp causes assertion error

Same problem： ``` CUDA_VISIBLE_DEVICES=0 python llama_inference.py ./llama-hf/llama-7b --load llama7b-4bit-128g.pt --text "this is llama" --wbits 4 --groupsize 128 Loading model ... Found 3 unique KN Linear values. Warming up autotune cache...