bitsandbytes
bitsandbytes copied to clipboard
Fixed optim update error with non-contiguous grads/params
Fixes #1185
Non-contiguous params/gradients resulting from torch.chunk and all_gather etc. are ubiquitous in distributed training frameworks such as ZeRO. This avoids update errors as the C++ kernels assume row-major inputs.