pai4451

Results 3 issues of pai4451

Recently, HuggingFace `transformers` has a new feature on [int8 quantization](https://github.com/huggingface/transformers/pull/17901) for all HuggingFace models. This feature could reduce the size of the large models by up to 2 without a...

--- ## 1. General Description This PR intends to fix the conditional statement on [stream](https://platform.openai.com/docs/api-reference/completions/create#completions/create-stream), which is an OpenAI input used to control whether to return server-sent events (SSE). The...

Hi, thanks for supporting the BLOOM model in the latest release of fastertransformer backend. I tried the latest code on my 8x A6000 GPU server with 48G ram per GPU...