bitsandbytes I wonder how does this compare and if you could leverage Nvidia transformer engine

I wonder how does this compare and if you could leverage Nvidia transformer engine

Open LifeIsStrange opened this issue 3 years ago • 1 comments

In the 2023 architecture Hopper:

The Transformer Engine intelligently manages and dynamically chooses between FP8 and 16-bit calculations, automatically handling re-casting and scaling between FP8 and 16-bit in each layer to deliver up to 9x faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100.

Aug 21 '22 21:08 LifeIsStrange

The transformer engine would be a perfect fit for this LLM.int8() algorithm. However, at this point, not enough details are known to say how big the advantage would be. Once all the details are released, we will assess how it can be integrated into bitsandbytes.

Sep 05 '22 22:09 TimDettmers

Hi @TimDettmers Is this something that would be worth looking into again, I'm out of my league here but willing to help in any way fp8 on the 4090 would be big for the community

Jul 03 '23 16:07 epinnock

bitsandbytes bitsandbytes copied to clipboard

I wonder how does this compare and if you could leverage Nvidia transformer engine

bitsandbytes
bitsandbytes copied to clipboard