Wenxuan Tan

Results 40 issues of Wenxuan Tan

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### Required Troubleshooting Steps - [X] I have followed these troubleshooting steps -...

ssh

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

This line ignores `min_8bit_size` by setting all params with numel < 4096 to fp32, so I've removed it. cc @Titus-von-Koeller

Fixes #1185 Non-contiguous params/gradients resulting from `torch.chunk` and `all_gather` etc. are ubiquitous in distributed training frameworks such as ZeRO. This avoids update errors as the C++ kernels assume row-major inputs....