TransformerEngine How to use FP8 of TransformerEngine in inference

How to use FP8 of TransformerEngine in inference

Open Godlovecui opened this issue 1 year ago • 3 comments

ENV: RTX 8*4090

I want to test FP8 of TransformerEngine in llama3 (from huggingface) for inference. I can not find instructions on inference. Can you share some code? Thank you~

Jun 13 '24 08:06 Godlovecui

We are working on a tutorial for inference with Gemma: https://github.com/NVIDIA/TransformerEngine/blob/5cb8ed4d129245357363361947e5b1d31c543783/docs/examples/te_gemma/tutorial_generation_gemma_with_te.ipynb. We're still tweaking it, so we'd appreciate any feedback at https://github.com/NVIDIA/TransformerEngine/pull/829.

Jun 13 '24 18:06 timmoon10

Hi, Can TransformerEngine be compiled into a pip package，if I want to use transformerEngine in vLLM. Thank you~ @timmoon10

Jun 19 '24 03:06 Godlovecui

@Godlovecui we will be releasing a pip wheel for TE in the next few months

Sep 05 '24 16:09 sbhavani

TransformerEngine TransformerEngine copied to clipboard

How to use FP8 of TransformerEngine in inference

TransformerEngine
TransformerEngine copied to clipboard