ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

support shardinit option to avoid OPT OOM initializing problem

Open nemoramo opened this issue 1 year ago β€’ 1 comments

πŸ“Œ Checklist before creating the PR

  • [x] I have created an issue for this PR for traceability
  • [x] The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • [ ] I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234 fixed #2758

πŸ“ What does this PR do?

Summarize your work here. if you have any plots/diagrams/screenshots/tables, please attach them here.

This commit add an β€œshardinit” option in opt example to overcome OOM problem caused by naive model initialization.

πŸ’₯ Checklist before requesting a review

  • [x] I have linked my PR to an issue (instruction)
  • [x] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • [x] I have performed a self-review of my code
  • [ ] I have added thorough tests.
  • [ ] I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • [x] 🌝 Yes, I do.
  • [ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

nemoramo avatar Mar 07 '23 09:03 nemoramo

Hi @nemoramo Thank you very much for your contribution!

binmakeswell avatar Mar 07 '23 13:03 binmakeswell

Hi @nemoramo Thanks for your contribution. But it fails in CI, could you please fix it? Thanks.

binmakeswell avatar Mar 08 '23 02:03 binmakeswell

Hi @nemoramo Thanks for your contribution. But it fails in CI, could you please fix it? Thanks.

It seems no coverage report is included. Any ideas to fix this?

nemoramo avatar Mar 08 '23 02:03 nemoramo

Hi, the CI failure can be ignored as there is no change in the colossalai library.

FrankLeeeee avatar Mar 08 '23 05:03 FrankLeeeee