DeepSpeed Add Compressedbackend for Onebit optimizers

In the process of adding onebit optimizers support for XPU devices, we have noticed that for different accelerator, the main difference of implementation of compressed_allreduce lies on packbits and unpackbits. CUDA uses cupy and NPU uses torch_npu. Instead of replace these to xpu only functions, we provided a CompressedBackend to do the compressed_allreduce work where users can add their own packbits/unpackbits kernels, which is a general path for all kinds of accelerators.

In this PR, we:

Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam
Add XPU implement of packbits/unpackbits with SYCL, built in PackbitsBuilder
Add tests for onebit with CompressedBackend

Apr 28 '24 02:04 Liangliang-Ma

@tjruwase this PR is an approach to abstract the generic part of 1bit-adam and implment accelerator dependent part with DeepSpeed custom op builder. So 1bit-adam does not need to depend on accelerator specific libraries.

@inkcherry I remember you investigated in 1bit adam portability before, FYI this PR implement a portable version of 1bit adam support.

May 07 '24 00:05 delock

Hi @tjruwase , could you please help to review this PR? Thanks!

May 14 '24 06:05 Liangliang-Ma

@tjruwase I have noticed that in onebit unit test, the onebit comm backend is assigned like this: "comm_backend_name": get_accelerator().communication_backend_name(). But in fact, it's not the same thing, for get_accelerator().communication_backend_name() chooses which backend to do regular comm, like broadcast, allgather etc, while onebit backend specifies how to do error-compensated compression. We have MPI-based onebit backend not in accelerators' comm backend. And also I found that both HPU and NPU accelerator have 'hccl' communication_backend_name, which will lead to same onebit backend. So maybe we can change this part? How about we add a interface under accelerator like get_accelerator().onebit_backend() or change unit test to a more general way? Thanks.

May 24 '24 03:05 Liangliang-Ma

@tjruwase Hi, May I ask if you could help to review my last comment or merge this one first? Thanks

Jun 05 '24 03:06 Liangliang-Ma

@Liangliang-Ma, apologies for delay. I am still thinking about your last comment, but will not delay this PR.

Jun 05 '24 20:06 tjruwase

DeepSpeed DeepSpeed copied to clipboard

Add Compressedbackend for Onebit optimizers

DeepSpeed
DeepSpeed copied to clipboard