ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[fsdp] impl save/load shard model/optimizer

Open ericxsun opened this issue 5 months ago • 6 comments

📌 Checklist before creating the PR

  • [x] I have created an issue for this PR for traceability
  • [x] The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • [ ] I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

resolved https://github.com/hpcaitech/ColossalAI/issues/5328

image

📝 What does this PR do?

Summarize your work here. if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • [x] I have linked my PR to an issue (instruction)
  • [x] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • [x] I have performed a self-review of my code
  • [ ] I have added thorough tests.
  • [ ] I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • [x] 🌝 Yes, I do.
  • [ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

ericxsun avatar Feb 02 '24 08:02 ericxsun

Could someone in colossalai review it?

Thx. cc @flybird11111

ericxsun avatar Feb 06 '24 07:02 ericxsun

Could you add unit tests to check this feature?

ver217 avatar Feb 06 '24 07:02 ver217

Could you add unit tests to check this feature?

I've tested in my local experiments image

But how to test it in unittest: colossalai/tests

ericxsun avatar Feb 06 '24 07:02 ericxsun

Could you add unit tests to check this feature?

I've tested in my local experiments image

But how to test it in unittest: colossalai/tests

You can add your tests in tests/test_checkpoint_io/test_torch_fsdp_checkpoint_io.py and just run pytest tests/test_checkpoint_io/test_torch_fsdp_checkpoint_io.py. Thanks!

ver217 avatar Feb 06 '24 07:02 ver217

Could you add unit tests to check this feature?

I've tested in my local experiments image But how to test it in unittest: colossalai/tests

You can add your tests in tests/test_checkpoint_io/test_torch_fsdp_checkpoint_io.py and just run pytest tests/test_checkpoint_io/test_torch_fsdp_checkpoint_io.py. Thanks!

Okay

ericxsun avatar Feb 06 '24 08:02 ericxsun

@ver217

pytest tests/test_checkpoint_io/test_torch_fsdp_checkpoint_io.py
================================================================= test session starts ==================================================================
platform linux -- Python 3.10.11, pytest-8.0.0, pluggy-1.4.0
rootdir: ~/ColossalAI
configfile: pytest.ini
plugins: hypothesis-6.75.2, anyio-4.0.0
collected 1 item

tests/test_checkpoint_io/test_torch_fsdp_checkpoint_io.py .                                                                                      [100%]

================================================================== 1 passed in 6.91s ===================================================================

ericxsun avatar Feb 18 '24 07:02 ericxsun