FasterTransformer
FasterTransformer copied to clipboard
Plans to implement HF's int8 inference?
Would be great if someone could look into implementing this particular version of int8 for serving LLMs. https://huggingface.co/blog/hf-bitsandbytes-integration
Thank you for the suggestion. We will consider this optimization.