DeepSpeed
DeepSpeed copied to clipboard
Add Compressedbackend for Onebit optimizers
In the process of adding onebit optimizers support for XPU devices, we have noticed that for different accelerator, the main difference of implementation of compressed_allreduce lies on packbits and unpackbits. CUDA uses cupy and NPU uses torch_npu. Instead of replace these to xpu only functions, we provided a CompressedBackend to do the compressed_allreduce work where users can add their own packbits/unpackbits kernels, which is a general path for all kinds of accelerators.
In this PR, we:
- Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam
- Add XPU implement of packbits/unpackbits with SYCL, built in PackbitsBuilder
- Add tests for onebit with CompressedBackend
@tjruwase this PR is an approach to abstract the generic part of 1bit-adam and implment accelerator dependent part with DeepSpeed custom op builder. So 1bit-adam does not need to depend on accelerator specific libraries.
@inkcherry I remember you investigated in 1bit adam portability before, FYI this PR implement a portable version of 1bit adam support.
Hi @tjruwase , could you please help to review this PR? Thanks!
@tjruwase I have noticed that in onebit unit test, the onebit comm backend is assigned like this:
"comm_backend_name": get_accelerator().communication_backend_name().
But in fact, it's not the same thing, for get_accelerator().communication_backend_name() chooses which backend to do regular comm, like broadcast, allgather etc, while onebit backend specifies how to do error-compensated compression. We have MPI-based onebit backend not in accelerators' comm backend. And also I found that both HPU and NPU accelerator have 'hccl' communication_backend_name, which will lead to same onebit backend. So maybe we can change this part? How about we add a interface under accelerator like get_accelerator().onebit_backend() or change unit test to a more general way? Thanks.
@tjruwase Hi, May I ask if you could help to review my last comment or merge this one first? Thanks
@Liangliang-Ma, apologies for delay. I am still thinking about your last comment, but will not delay this PR.