Wenxuan Tan issues

Results 40 issues of


                                            Wenxuan Tan

[feat] Add distributed lamb; minor fixes in DeviceMesh and comments

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

[Feature] Add Galore (Adam, Adafactor) and distributed GaloreAdamW8bit

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

[Feauture] MoE refractor; Intergration with Mixtral

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

[Feauture] MoE refractor; Intergration with Mixtral

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

[hotfix] fix readme's kernels build instruction

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

[Remote-SSH Bug]: failed to acquire lock for install. ln permission denied

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### Required Troubleshooting Steps - [X] I have followed these troubleshooting steps -...

ssh

[misc] update distributed optim docs

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

fix min_8bit_size ignored bug

This line ignores `min_8bit_size` by setting all params with numel < 4096 to fp32, so I've removed it. cc @Titus-von-Koeller

Fixed optim update error with non-contiguous grads/params

Fixes #1185 Non-contiguous params/gradients resulting from `torch.chunk` and `all_gather` etc. are ubiquitous in distributed training frameworks such as ZeRO. This avoids update errors as the C++ kernels assume row-major inputs....