auto-round speedup by disable_low_gpu_mem_usage and reduce memory usage by avoid using torch.cat

speedup by disable_low_gpu_mem_usage and reduce memory usage by avoid using torch.cat

Open wenhuach21 opened this issue 9 months ago • 0 comments

smoke test done: llama3 with lmhead bachuan13b with lmhead chatglm3(lm head name transformer.output_layer) opt tied_lm-head

gemma-7b

phi-2 lm head

mixtral

Qwen1.5-7B-Chat lm-head

Baichuan2-7B-Chat lm-head

gpt-j-6b lm-head

LaMini-GPT-124M conv1d tied weight

gpt-neo-125m lm-head tied weight

dolly-v2-3b embed_out

stablelm-base-alpha-3 tied embed_out

bloom7b1 tied lm-head

Phi-3-mini-4k-instruct lm-head

solar lm-head

llama3_8b_instruct-chat lm-head

codegen25-7b-mult 4.33.2 lm-head

May 09 '24 12:05 wenhuach21