text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

New NVIDIA partnership: TE inference speedup

Open SinanAkkoyun opened this issue 1 year ago • 2 comments

Feature request

Hello, thank you for all the work! With the new NVIDIA partnership supplying H100 GPUs, could you please implement FP8 TransformerEngine speedup?

Motivation

That would mean a lot for inference speed for HuggingSpaces and perhaps HuggingChat, making the speedier HuggingChat more competitive with ChatGPT 3.5 Turbo.

Thanks a lot

Your contribution

I am too inexperienced and with attempting to PR such changes I would definitely break many things

SinanAkkoyun avatar Aug 11 '23 17:08 SinanAkkoyun

Yes we will add TE kernels at some point. This is not a high priority for now and will have to wait for a later release.

OlivierDehaene avatar Sep 06 '23 13:09 OlivierDehaene

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 09 '24 01:04 github-actions[bot]