DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

CPUAdam fp16 and bf16 support

Open BacharL opened this issue 1 year ago • 3 comments

Hi. Please review the following changes I added support for BF16 to cpu adam. BF16, FP16 and float are supported at compilation time. the correct template is called at runtime according to input params dtype.

BacharL avatar Apr 14 '24 12:04 BacharL

@BacharL, thanks for this incredible improvement to the offloading optimizers and op builders. I left a few comments and questions, but overall looks good to me.

tjruwase avatar May 04 '24 17:05 tjruwase

@tjruwase Thanks for reviewing this change. I have made changes to address your comments. Now there is no need to pass HALF_DTYPE as compiler define. all functions will be templated according to ds_device_precision_t. removed all half_precision parameters.

BacharL avatar May 06 '24 11:05 BacharL

Added templated invoker to help selecting the implementation The map stores function pointers to templated functions, the key is the type enum. At initialization all supported dtypes are templated and inserted into the map. I didn't clean ds_adagrad_step_plus_copy and related code under __ENABLE_CUDA__ but also couldn't test it.

BacharL avatar May 08 '24 11:05 BacharL