Ming Lin
Results
2
issues of
Ming Lin
Hello! I followed D+S packing instruction and stored the packed .pt file in "~/models/${model_name}-squeezellm/packed_weight", where model_name="Llama-2-7b-chat-hf". When I load this model in vLLM: ``` python examples/llm_engine_example.py --dtype float16 --model ~/models/${model_name}-squeezellm/packed_weight...
This is a really nice work! I followed the instruction to quantize Llama-2-7b-chat-hf. At kmeans clustering step, I ran the following command: ``` python nuq.py --bit 4 --model_type llama --model...