Ray Huang issues

Repositories
Issues
Comments

Results 4 issues of


                                            Ray Huang

[BUG] run_zero_quant.sh seems not working

**Describe the bug** I am trying run_zero_quant for GPT-J model, but I find the output model size is not compressed, the model file size is the same as origin model,...

bug

compression

gptneox & gptj int8 quantization & share context

support gptneox int8 & share context gptj int8 - will throw exception when running in int8 mode, will need to be fixed

int8 support for gptj&gptneox

server crashs when traffic is a little bit high

### Description ```shell main branch, V100 Deployed docker pods crashs and restarts every few minutes. It seems stable when qps is low. Below is error log before pods crashs which...

bug