bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

I wonder how does this compare and if you could leverage Nvidia transformer engine

Open LifeIsStrange opened this issue 3 years ago • 1 comments

In the 2023 architecture Hopper:

The Transformer Engine intelligently manages and dynamically chooses between FP8 and 16-bit calculations, automatically handling re-casting and scaling between FP8 and 16-bit in each layer to deliver up to 9x faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100.

LifeIsStrange avatar Aug 21 '22 21:08 LifeIsStrange

The transformer engine would be a perfect fit for this LLM.int8() algorithm. However, at this point, not enough details are known to say how big the advantage would be. Once all the details are released, we will assess how it can be integrated into bitsandbytes.

TimDettmers avatar Sep 05 '22 22:09 TimDettmers

Hi @TimDettmers Is this something that would be worth looking into again, I'm out of my league here but willing to help in any way fp8 on the 4090 would be big for the community

epinnock avatar Jul 03 '23 16:07 epinnock