Sam Ade Jacobs comments

Results 16 comments of


                                            Sam Ade Jacobs

Add candle fe

Also, I suggest a comparison of existing prototext with PFE to verify correctness.

[BUG] Failed to checkpoint with deepspeed 0.12.4

@Quentin-Anthony , I had no issue saving checkpoints with Megatron DeepSpeed training of the GPT-350M model with both ZeRO-1 and ZeRO-3. Additional configurations of interest are as follows: 8 V100...

Deepspeed Ulysses

Ulysses is, in principle, attention-type agnostic. Although we haven’t specifically tested Ulysses with Ring Attention, as long as the qkv can be split or sharded along sequence and head dimensions,...

[Q&A] Why Deepspeed Ulysses could support long sequence length?

@Momo-Tori , yes, Ulysses is a form of TP in the sense that attention block is head parallel. In general, Ulysses is sequence parallelism + head parallelism. It starts out...

When using pure DeepSpeed ulysses and zero stage 3 to continue pre-training, the loss gap between each GPU is too large.[BUG]

@Kwen-Chen, your input data processing looks good to me. As for your second and third questions, you need a sequence- parallel-aware loss calculation ([see example here](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/core/sequence_parallel/cross_entropy.py)).

[BUG] Problems with MiCS training

@LoggerHead22, we will look into this issue. As an alternative (stopgap measure), please consider using [hpZ component of ZeRO++](https://www.deepspeed.ai/tutorials/zeropp/).

Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.

We recommend that you use [DeepSpeed universal checkpoint](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing).

[BUG] torch.compile doesn't work with stage 2 on 32 GPUs

@lleizuo , could you please provide additional details (e.g., model and training hyperparams) to reproduce this issue?

[BUG] torch.compile doesn't work with stage 2 on 32 GPUs

Hi @noob-ctrl, do you have a repro?

[BUG] torch.compile doesn't work with stage 2 on 32 GPUs

Closing, please re-open with a repro if needed.