auto-round
auto-round copied to clipboard
speedup by disable_low_gpu_mem_usage and reduce memory usage by avoid using torch.cat
smoke test done: llama3 with lmhead bachuan13b with lmhead chatglm3(lm head name transformer.output_layer) opt tied_lm-head
gemma-7b
phi-2 lm head
mixtral
Qwen1.5-7B-Chat lm-head
Baichuan2-7B-Chat lm-head
gpt-j-6b lm-head
LaMini-GPT-124M conv1d tied weight
gpt-neo-125m lm-head tied weight
dolly-v2-3b embed_out
stablelm-base-alpha-3 tied embed_out
bloom7b1 tied lm-head
Phi-3-mini-4k-instruct lm-head
solar lm-head
llama3_8b_instruct-chat lm-head
codegen25-7b-mult 4.33.2 lm-head