ColossalAI
ColossalAI copied to clipboard
[moe] support mixtral
📌 Checklist before creating the PR
- [ ] I have created an issue for this PR for traceability
- [ ] The title follows the standard format:
[doc/gemini/tensor/...]: A concise description - [ ] I have added relevant tags if possible for us to better distinguish different PRs
🚨 Issue number
Link this PR to your issue with words like fixed to automatically close the linked issue upon merge
e.g.
fixed #1234,closed #1234,resolved #1234
📝 What does this PR do?
Summarize your work here. if you have any plots/diagrams/screenshots/tables, please attach them here.
💥 Checklist before requesting a review
- [ ] I have linked my PR to an issue (instruction)
- [ ] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
- [ ] I have performed a self-review of my code
- [ ] I have added thorough tests.
- [ ] I have added docstrings for all the functions/methods I implemented
⭐️ Do you enjoy contributing to Colossal-AI?
- [ ] 🌝 Yes, I do.
- [ ] 🌚 No, I don't.
Tell us more if you don't enjoy contributing to Colossal-AI.
I tried the code in this CR, and unfortunately, I found that the loss does not decrease during training.
Use the latest commit code, however after several iterations,the loss becomes NaN
Just leave it?
this code is for experiment only. the updated code is in progress and will be released soon
Is there a approximate release time? Since the latest commit still contains bugs.
We are still working on the pipeline. Once we validate the process, we will release.