Hongxin Liu
Hongxin Liu
### Describe the feature In our current design, the replay buffer is not distributed. For the consistency and generalization of data sampling during training, each process has a complete copy...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
## Overview Current implementation is not tested on many models. We have to add large scale correctness verification. Wanna track the development progress? Take a look at proposal: https://github.com/hpcaitech/ColossalAI/discussions/3124 kanban:...
## Overview This work should be started after #3148 . And then we have ability create a model with lazy initialiazation and sharding. We have to verify the correctness for...
## Overview We have implemented a single-process version. We may want lazy tensor can be distributed during/after materialization, this feature may be powered by dtensor. Wanna track the development progress?...
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...