FasterTransformer Support model weights and calculations in bfloat16

Probably this is a long shot, but all new NVIDIA accelerators already support bfloat16, and there are some models like T5 that are actually trained on TPUs and have option to export weights in bfloat16.

Making inference in bfloat16 would require less casts to float32 compared to float16, therefore better performance.

Jun 02 '22 16:06 vlasenkoalexey

Currently, we don't find obvious drop when running FP16 on BF16 model. We also support BF16 supporting on GPT, you can save model as FP32 and run inference on bf16. We have plan to support BF16 inference on more models in the future.

Jun 02 '22 23:06 byshiue

bfloat16 calculation is supported in most model in latest release. Because we don't have good way to save the bfloat16 weight now, so you still need to store the model as FP32 now.

Aug 16 '22 03:08 byshiue

Are the activations in BFloat16 when inference is said to be in bfloat16? (except may be some softmax layers).

Sep 12 '22 20:09 cayleyhamilton

When users set data_type to be bfloat16, this means that the input/output of all layers are bfloat16, but FasterTransformer may uses fp32 to compute/accumulate in some kernels, like GeLU.

Sep 13 '22 00:09 byshiue

bfloat16 calculation is supported in most model in latest release. Because we don't have good way to save the bfloat16 weight now, so you still need to store the model as FP32 now.

@byshiue do we support save bfloat16 weights now?

so you still need to store the model as FP32 now. If we store the model as FP32 now, how do we load that weight in FT C++ implemented models then?

Apr 30 '23 14:04 void-main

Because we don't have good way to save the bfloat16 weight now

Hi @byshiue , another quick question, since there are some numpy extensions to bfloat16 (such as bfloat16 and tensorstore), I wonder if we could just convert huggingface weights to bfloat16 numpy with these extensions? If not, may I ask why it is not a good way to save the weights?

Apr 30 '23 14:04 void-main

Hi, @void-main, FT does not support loading bfloat16 weight now because numpy does not support such data type. It is a good idea to use numpy extensions to support bfloat16. We will consider it, thank you for the suggestions.

May 01 '23 07:05 byshiue

FasterTransformer FasterTransformer copied to clipboard

Support model weights and calculations in bfloat16

FasterTransformer
FasterTransformer copied to clipboard