Ray Huang

Results 4 issues of Ray Huang

**Describe the bug** I am trying run_zero_quant for GPT-J model, but I find the output model size is not compressed, the model file size is the same as origin model,...

bug
compression

support gptneox int8 & share context gptj int8 - will throw exception when running in int8 mode, will need to be fixed

### Description ```shell main branch, V100 Deployed docker pods crashs and restarts every few minutes. It seems stable when qps is low. Below is error log before pods crashs which...

bug