Fixed optim update error with non-contiguous grads/params

Open Edenzzzz opened this issue 1 year ago • 0 comments

Fixes #1185 Non-contiguous params/gradients resulting from torch.chunk and all_gather etc. are ubiquitous in distributed training frameworks such as ZeRO. This avoids update errors as the C++ kernels assume row-major inputs.

cc @Titus-von-Koeller

Apr 23 '24 06:04 Edenzzzz