FasterTransformer
FasterTransformer copied to clipboard
Support model weights and calculations in bfloat16
Probably this is a long shot, but all new NVIDIA accelerators already support bfloat16, and there are some models like T5 that are actually trained on TPUs and have option to export weights in bfloat16.
Making inference in bfloat16 would require less casts to float32 compared to float16, therefore better performance.
Currently, we don't find obvious drop when running FP16 on BF16 model. We also support BF16 supporting on GPT, you can save model as FP32 and run inference on bf16. We have plan to support BF16 inference on more models in the future.
bfloat16 calculation is supported in most model in latest release. Because we don't have good way to save the bfloat16 weight now, so you still need to store the model as FP32 now.
Are the activations in BFloat16 when inference is said to be in bfloat16? (except may be some softmax layers).
When users set data_type
to be bfloat16, this means that the input/output of all layers are bfloat16, but FasterTransformer may uses fp32 to compute/accumulate in some kernels, like GeLU.
bfloat16 calculation is supported in most model in latest release. Because we don't have good way to save the bfloat16 weight now, so you still need to store the model as FP32 now.
@byshiue do we support save bfloat16 weights now?
so you still need to store the model as FP32 now. If we store the model as FP32 now, how do we load that weight in FT C++ implemented models then?
Because we don't have good way to save the bfloat16 weight now
Hi @byshiue , another quick question, since there are some numpy extensions to bfloat16 (such as bfloat16 and tensorstore), I wonder if we could just convert huggingface weights to bfloat16 numpy with these extensions? If not, may I ask why it is not a good way to save the weights?
Hi, @void-main, FT does not support loading bfloat16 weight now because numpy does not support such data type. It is a good idea to use numpy extensions to support bfloat16. We will consider it, thank you for the suggestions.