Cheng, Penghui

Results 4 issues of Cheng, Penghui

# What does this PR do? Add chatglm config in NormalizedConfigManager. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the...

Based on the suggestion of https://github.com/AutoGPTQ/AutoGPTQ/issues/597, we have implemented the inference of GPTQ model on the CPU device. This PR will support Weight-Only quantization on CPU devices and infernce with...

## Type of Change feature No API changed ## Description Removed fallback of lm_head op for WOQ ## Expected Behavior & Potential Risk Don't fallback lm_head when weight-only quantization. ##...

WIP

Based on the suggestion of https://github.com/casper-hansen/AutoAWQ/issues/390, we have implemented the inference of AWQ model on the CPU device. This PR will support Weight-Only quantization on CPU devices and infernce with...