ColossalAI

ColossalAI copied to clipboard

Reame
Issues

[Feature] Split cross-entropy computation in SP

Open Edenzzzz opened this issue 1 year ago • 0 comments

📌 Checklist before creating the PR

[ ] I have created an issue for this PR for traceability
[ ] The title follows the standard format: [doc/gemini/tensor/...]: A concise description
[ ] I have added relevant tags if possible for us to better distinguish different PRs
[ ] I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Avoid gathering hidden states before cross entropy computation to save comm & compute. All-reduce the final loss & grad instead.
Support SP for GPT
Merge the logic of PP model fwd and model fwd for GPT2 and Llama for easy maintainence in the future.

💥 Checklist before requesting a review

[ ] I have linked my PR to an issue (instruction)
[ ] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
[ ] I have performed a self-review of my code
[ ] I have added thorough tests.
[ ] I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

[ ] 🌝 Yes, I do.
[ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

Aug 01 '24 13:08 Edenzzzz