Ke Wen comments

Results 65 comments of


                                            Ke Wen

Retrieving the Trained Model

Sorry for replying late. We have migrated the PiPPy library to [`torch.distributed.pipelining`](https://github.com/pytorch/pytorch/tree/main/torch/distributed/pipelining) Here is our new documentation: https://pytorch.org/docs/main/distributed.pipelining.html. In section "Option 2", you can see: > The Pipe object provides...

Retrieving the Trained Model

Hmm, do you mean getting back the full model at the end of training, but before saving the final checkpoint? It might be hard, I think, because each stage's updated...

Integration with DCP

Thanks for making it work! Quick comment: Do you mind creating a dedicated example for DCP + PP? You can copy the model out (we plan to build a "model...

Integration with DCP

What's our plan for this PR? @LucasLLC I think we are pretty close to the destination. Would the following next steps be reasonable? 1. Move the example to `examples/checkpoint`, and...

Integration with DCP

For code quality checks, please run: ``` ./format.sh ./check.sh ```

PyTorch native 2D LLaMA inference

Documenting my discussion with @wanchaol wrt to DTensor and `scaled_dot_product_attention`: @kwen2501 : Should we do to_local as soon as we did colwise, or should we do to_local when we hit...

Could pippy be coexisted with deepspeed?

Hi @leiwen83, that's an interesting question. I think at the Zero-2 stage (where the gradients are sharded), there would need to be some special arrangement: As each micro-batch runs their...

Unify argument parsing across tests and examples to avoid code duplication

Yes, please! You are so much welcome to pull a PR!

numerical difference for SDPA between non-dtensor vs dtensor, when math attention and fp16 are used

Is the numeric difference seen in backward only or in forward too?

[pipelining] Add _PipelineStage runtime

The original plan was for tracer's stage and manual's stage be in different files (files are more 1:1 mapped with classes back then), and the base be on the manual...