TransformerEngine
TransformerEngine copied to clipboard
How to use FP8 of TransformerEngine in inference
ENV: RTX 8*4090
I want to test FP8 of TransformerEngine in llama3 (from huggingface) for inference. I can not find instructions on inference. Can you share some code? Thank you~
We are working on a tutorial for inference with Gemma: https://github.com/NVIDIA/TransformerEngine/blob/5cb8ed4d129245357363361947e5b1d31c543783/docs/examples/te_gemma/tutorial_generation_gemma_with_te.ipynb. We're still tweaking it, so we'd appreciate any feedback at https://github.com/NVIDIA/TransformerEngine/pull/829.
Hi, Can TransformerEngine be compiled into a pip package,if I want to use transformerEngine in vLLM. Thank you~ @timmoon10
@Godlovecui we will be releasing a pip wheel for TE in the next few months