neural-compressor Enable Llama MoE models' GPTQ quantization

Enable Llama MoE models' GPTQ quantization

Open YIYANGCAI opened this issue 1 year ago • 1 comments

Type of Change

new feature

Mar 04 '24 02:03 YIYANGCAI

@YIYANGCAI please resolve the conflict, will this PR target v2.6 release?

May 21 '24 06:05 chensuyue

Hi @YIYANGCAI, I saw nn.Conv2d, nn.Conv1d are supported in GPT. Does that mean MOE have these two op types? I previously thought that only transformer.conv1d is required.

May 27 '24 08:05 xin3he

neural-compressor neural-compressor copied to clipboard

Enable Llama MoE models' GPTQ quantization

Type of Change

neural-compressor
neural-compressor copied to clipboard