Olatunji Ruwase
Olatunji Ruwase
@szhengac I am really sorry to hear that this is blocking issue for over a month. My concern is that this PR creates a strange code path that could be...
Got it. Thanks for sharing this scenario. In that case, what if you called [optimizer.refresh_fp32_params()](https://github.com/microsoft/DeepSpeed/blob/24055597bafe133299900a7e243663dd92bdde2c/deepspeed/runtime/zero/stage1.py#L1031-L1032) from your client script, after you load_checkpoint() returns. Can you check if this achieves your...
@szhengac I am happy to look into the ZeRO-1 regression. Can you open an issue and provide repro steps? Regarding the original finetuning issue, here is my description of the...
@harishsg99, thanks for adding this feature. Can you please add some unit tests? Thanks!
@roywei, thanks for this PR. Can you please add some unit tests? An appropriate location would be `tests/unit/test_data.py`
@roywei, can you please address formatting issues using instructions [here](https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites)? Thanks.
@roywei, just wanted to check if you able to finish this PR? Thank!
@pacman100, can you please confirm what you observe with this issue. Does the run crash or simply hang? I am trying to match with my observations. Thanks!
I see this error message in the gist log. Can you confirm that pybind11 is installed?
@Santosh-Gupta, can you share more details on this, such as the model and command line?