DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Add Compressedbackend for Onebit optimizers

Open Liangliang-Ma opened this issue 1 year ago • 3 comments

In the process of adding onebit optimizers support for XPU devices, we have noticed that for different accelerator, the main difference of implementation of compressed_allreduce lies on packbits and unpackbits. CUDA uses cupy and NPU uses torch_npu. Instead of replace these to xpu only functions, we provided a CompressedBackend to do the compressed_allreduce work where users can add their own packbits/unpackbits kernels, which is a general path for all kinds of accelerators.

In this PR, we:

  1. Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam
  2. Add XPU implement of packbits/unpackbits with SYCL, built in PackbitsBuilder
  3. Add tests for onebit with CompressedBackend

Liangliang-Ma avatar Apr 28 '24 02:04 Liangliang-Ma

@tjruwase this PR is an approach to abstract the generic part of 1bit-adam and implment accelerator dependent part with DeepSpeed custom op builder. So 1bit-adam does not need to depend on accelerator specific libraries.

@inkcherry I remember you investigated in 1bit adam portability before, FYI this PR implement a portable version of 1bit adam support.

delock avatar May 07 '24 00:05 delock

Hi @tjruwase , could you please help to review this PR? Thanks!

Liangliang-Ma avatar May 14 '24 06:05 Liangliang-Ma

@tjruwase I have noticed that in onebit unit test, the onebit comm backend is assigned like this: "comm_backend_name": get_accelerator().communication_backend_name(). But in fact, it's not the same thing, for get_accelerator().communication_backend_name() chooses which backend to do regular comm, like broadcast, allgather etc, while onebit backend specifies how to do error-compensated compression. We have MPI-based onebit backend not in accelerators' comm backend. And also I found that both HPU and NPU accelerator have 'hccl' communication_backend_name, which will lead to same onebit backend. So maybe we can change this part? How about we add a interface under accelerator like get_accelerator().onebit_backend() or change unit test to a more general way? Thanks.

Liangliang-Ma avatar May 24 '24 03:05 Liangliang-Ma

@tjruwase Hi, May I ask if you could help to review my last comment or merge this one first? Thanks

Liangliang-Ma avatar Jun 05 '24 03:06 Liangliang-Ma

@Liangliang-Ma, apologies for delay. I am still thinking about your last comment, but will not delay this PR.

tjruwase avatar Jun 05 '24 20:06 tjruwase