Ray Huang
Ray Huang
**Describe the bug** I am trying run_zero_quant for GPT-J model, but I find the output model size is not compressed, the model file size is the same as origin model,...
support gptneox int8 & share context gptj int8 - will throw exception when running in int8 mode, will need to be fixed
### Description ```shell main branch, V100 Deployed docker pods crashs and restarts every few minutes. It seems stable when qps is low. Below is error log before pods crashs which...