Jiani Wang

Results 37 comments of Jiani Wang

NOTE: Not for review now, I will test locally

> oh I think the bug was introduced here -- now with wrong indentation https://github.com/pytorch/torchtitan/pull/1776/files#diff-83b7868cc3b5fde38ae75ccd8346675495ed27207bc75c422cf8c2ef4d8096d3L210-L218 Can you elaborate more on this? Why this causing memory usage increase?

> what's the issue between compile + SAC + MoE? SAC will wrap each submodule of TransformerBlock separately ([_apply_op_sac_to_transformer_block_with_flex](https://github.com/pytorch/torchtitan/blob/refs/heads/main/torchtitan/distributed/activation_checkpoint.py#L158)), which will make each submodule of TransformerBlock an instance of CheckpointWrapper....

To check my understanding: > If you're only compiling a single op like FlexAttention, it is fine to not be able to see into the graph. So if only FlexAttn...

Thanks for asking! Now Qwen3 is not officially supported for Qwen3, and we haven't implemented and tested CP on Qwen3. This is because of the RoPE embedding differences (In Qwen3,...

> > Removing the version.txt file will break any builds intending to use the PyTorch Dev Infra build system and since this isn't urgent, I'd like to not merge quite...