Kaiyang Guo comments

Results 8 comments of


                                            Kaiyang Guo

[RFC][FSDP2] Added `register_fsdp_forward_method` for user fwd methods

Hi @awgu, thanks for doing this. As FSDP + generate is very slow, I wonder does this patch also improve the efficiency?

[RFC][FSDP2] Added `register_fsdp_forward_method` for user fwd methods

> @kygguo Are you using `reshard_after_forward=True` / `FULL_SHARD`? I am new to FSDP, it would be nice to hint where can I check this... Basically, I am runing DPO official...

[RFC][FSDP2] Added `register_fsdp_forward_method` for user fwd methods

Just found it's full shard, but I think I can revise it to others if there's room to speedup.

[RFC][FSDP2] Added `register_fsdp_forward_method` for user fwd methods

Sure, will feedback later

[RFC][FSDP2] Added `register_fsdp_forward_method` for user fwd methods

Hi @awgu , passing `sharding_strategy=ShardingStrategy.SHARD_GRAD_OP` helps! When I previously use FULL_SHARD, runing the code gets stuck in `model.generate()` and never returns. Changing to SHARD_GRAD_OP avoids this, even if I use...

[RFC][FSDP2] Added `register_fsdp_forward_method` for user fwd methods

Thanks for all the above!

Problem in dpo_llama2.py? NCCL timeout?

Have the same problem, and increasing NCCL timeout threshold works for me. ``` import torch.distributed as dist from datetime import timedelta dist.init_process_group(backend='nccl', init_method='env://', timeout=timedelta(hours=2)) ```

指定gpu运行失败：using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.

Hi, is there any update regarding this issue? It bothered me quite a few days.