text-generation-inference New NVIDIA partnership: TE inference speedup

New NVIDIA partnership: TE inference speedup

Open SinanAkkoyun opened this issue 2 years ago • 2 comments

Feature request

Hello, thank you for all the work! With the new NVIDIA partnership supplying H100 GPUs, could you please implement FP8 TransformerEngine speedup?

Motivation

That would mean a lot for inference speed for HuggingSpaces and perhaps HuggingChat, making the speedier HuggingChat more competitive with ChatGPT 3.5 Turbo.

Thanks a lot

Your contribution

I am too inexperienced and with attempting to PR such changes I would definitely break many things

Aug 11 '23 17:08 SinanAkkoyun

Yes we will add TE kernels at some point. This is not a high priority for now and will have to wait for a later release.

Sep 06 '23 13:09 OlivierDehaene

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 09 '24 01:04 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

New NVIDIA partnership: TE inference speedup

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard