ColossalAI [moe] support mixtral

📌 Checklist before creating the PR

[ ] I have created an issue for this PR for traceability
[ ] The title follows the standard format: [doc/gemini/tensor/...]: A concise description
[ ] I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here. if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

[ ] I have linked my PR to an issue (instruction)
[ ] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
[ ] I have performed a self-review of my code
[ ] I have added thorough tests.
[ ] I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

[ ] 🌝 Yes, I do.
[ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

Dec 15 '23 08:12 oahzxl

I tried the code in this CR, and unfortunately, I found that the loss does not decrease during training.

Dec 22 '23 07:12 luckyyangrun

Use the latest commit code, however after several iterations，the loss becomes NaN

Jan 18 '24 16:01 xs1997zju

Just leave it?

Jan 22 '24 08:01 xs1997zju

this code is for experiment only. the updated code is in progress and will be released soon

Jan 22 '24 08:01 oahzxl

Is there a approximate release time？ Since the latest commit still contains bugs.

Jan 22 '24 09:01 xs1997zju

We are still working on the pipeline. Once we validate the process, we will release.

Jan 22 '24 09:01 TongLi3701

ColossalAI ColossalAI copied to clipboard

[moe] support mixtral

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

ColossalAI
ColossalAI copied to clipboard