FasterTransformer Plans to implement HF's int8 inference?

Plans to implement HF's int8 inference?

Open JOHW85 opened this issue 2 years ago • 1 comments

Would be great if someone could look into implementing this particular version of int8 for serving LLMs. https://huggingface.co/blog/hf-bitsandbytes-integration

Aug 18 '22 04:08 JOHW85

Thank you for the suggestion. We will consider this optimization.

Aug 18 '22 06:08 byshiue

FasterTransformer FasterTransformer copied to clipboard

Plans to implement HF's int8 inference?

FasterTransformer
FasterTransformer copied to clipboard