text-generation-inference
text-generation-inference copied to clipboard
New NVIDIA partnership: TE inference speedup
Feature request
Hello, thank you for all the work! With the new NVIDIA partnership supplying H100 GPUs, could you please implement FP8 TransformerEngine speedup?
Motivation
That would mean a lot for inference speed for HuggingSpaces and perhaps HuggingChat, making the speedier HuggingChat more competitive with ChatGPT 3.5 Turbo.
Thanks a lot
Your contribution
I am too inexperienced and with attempting to PR such changes I would definitely break many things
Yes we will add TE kernels at some point. This is not a high priority for now and will have to wait for a later release.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.