FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

Plans to implement HF's int8 inference?

Open JOHW85 opened this issue 2 years ago • 1 comments

Would be great if someone could look into implementing this particular version of int8 for serving LLMs. https://huggingface.co/blog/hf-bitsandbytes-integration

JOHW85 avatar Aug 18 '22 04:08 JOHW85

Thank you for the suggestion. We will consider this optimization.

byshiue avatar Aug 18 '22 06:08 byshiue