Ke Wen

Results 65 comments of Ke Wen

Sorry for replying late. We have migrated the PiPPy library to [`torch.distributed.pipelining`](https://github.com/pytorch/pytorch/tree/main/torch/distributed/pipelining) Here is our new documentation: https://pytorch.org/docs/main/distributed.pipelining.html. In section "Option 2", you can see: > The Pipe object provides...

Hmm, do you mean getting back the full model at the end of training, but before saving the final checkpoint? It might be hard, I think, because each stage's updated...

Thanks for making it work! Quick comment: Do you mind creating a dedicated example for DCP + PP? You can copy the model out (we plan to build a "model...

What's our plan for this PR? @LucasLLC I think we are pretty close to the destination. Would the following next steps be reasonable? 1. Move the example to `examples/checkpoint`, and...

For code quality checks, please run: ``` ./format.sh ./check.sh ```

Documenting my discussion with @wanchaol wrt to DTensor and `scaled_dot_product_attention`: @kwen2501 : Should we do to_local as soon as we did colwise, or should we do to_local when we hit...

Hi @leiwen83, that's an interesting question. I think at the Zero-2 stage (where the gradients are sharded), there would need to be some special arrangement: As each micro-batch runs their...

Yes, please! You are so much welcome to pull a PR!

The original plan was for tracer's stage and manual's stage be in different files (files are more 1:1 mapped with classes back then), and the base be on the manual...