Ming Lin

Results 2 issues of Ming Lin

Hello! I followed D+S packing instruction and stored the packed .pt file in "~/models/${model_name}-squeezellm/packed_weight", where model_name="Llama-2-7b-chat-hf". When I load this model in vLLM: ``` python examples/llm_engine_example.py --dtype float16 --model ~/models/${model_name}-squeezellm/packed_weight...

This is a really nice work! I followed the instruction to quantize Llama-2-7b-chat-hf. At kmeans clustering step, I ran the following command: ``` python nuq.py --bit 4 --model_type llama --model...