Kaiyang Guo

Results 8 comments of Kaiyang Guo

Hi @awgu, thanks for doing this. As FSDP + generate is very slow, I wonder does this patch also improve the efficiency?

> @kygguo Are you using `reshard_after_forward=True` / `FULL_SHARD`? I am new to FSDP, it would be nice to hint where can I check this... Basically, I am runing DPO official...

Just found it's full shard, but I think I can revise it to others if there's room to speedup.

Hi @awgu , passing `sharding_strategy=ShardingStrategy.SHARD_GRAD_OP` helps! When I previously use FULL_SHARD, runing the code gets stuck in `model.generate()` and never returns. Changing to SHARD_GRAD_OP avoids this, even if I use...

Have the same problem, and increasing NCCL timeout threshold works for me. ``` import torch.distributed as dist from datetime import timedelta dist.init_process_group(backend='nccl', init_method='env://', timeout=timedelta(hours=2)) ```