Ke Wen comments

Results 65 comments of


                                            Ke Wen

Pippy ddp2pipe example doesn't work for pipeline

Hi, thanks for trying out this example. The `ddp2pipe` example is still work in progress. It works in CPU mode but for GPU mode, it requires setting of `TORCH_DISTRIBUTED_DEBUG=DETAIL` to...

torch.compile each TransformerBlock instead of the whole model

when PP is present, we may torch.compile the whole stage module, which is bigger than a transformer block, i.e. ``` pipe = pipeline(model, ...) stage_mod = pipe.get_stage_module(stage_idx) stage_mod = torch.compile(stage_mod)...

Unable to find the compile_stage library

Hi, the `compile_stage` API is deprecated. Some up-to-date examples include: examples/basic/example.py examples/huggingface examples/llama

Unable to find the compile_stage library

Here is a simple training + optimizer example: https://github.com/pytorch/PiPPy/blob/main/test/test_optim.py For backward, you can pass a loss function to PipelineSchedule: ``` # Attach to a schedule schedule = PipelineScheduleGPipe(stage, args.chunks, loss_fn=loss_fn)...

Is there a way to export a pipeline stage?

Hi, in the topmost of main, we added a new API: ``` pipe = pippy.pipeline(model, ...) stage_module = pipe.get_stage_module(stage_idx) ``` I think you can export stage_module in a couple ways:...

examples/Inference failed

Hi sorry about the stale example. Please refer to examples/huggingface for any inference example of your interest. Thanks.

Request for training examples using PipeStage and PipeSchedule

We are working on consolidating PipelineStage's runtime code with that from PipelineSchedule. An outcome of that would be training support. We will create a few examples when it is done....

Request for training examples using PipeStage and PipeSchedule

Here is a simple training example: https://github.com/pytorch/PiPPy/blob/main/examples/basic/example_train.py

FSDP+PP tracer issue with cast-to-bf16

An example program shows that torch.export would not burn dtype into the ExportedProgram at trace time: https://github.com/kwen2501/export-playground/blob/main/dtype.py See the kwargs for `zeros_like`. $ python dtype.py ``` opcode name target args...

FSDP+PP tracer issue with cast-to-bf16

The `zeros_like`'s dtype in the issue's program is likely the one that causes the dtype mismatch at bmm. We can use `graph_module.print_readable()` to see the original stack trace to identify...