inkcherry

Results 17 comments of inkcherry

> @inkcherry Is there a link to the demo code? I'm interested in the potential use case of this feature proposal. hi, @delock FYI:https://github.com/inkcherry/stanford_alpaca/tree/tp_demo see the latest commit msg Due...

> @inkcherry, thanks for this PR. Are you able to provide some observed memory and latency benefits? Hi, @tjruwase , I use a setup of 4xA800 80G with PyTorch version...

> I'm unable to get this to work. > > First I run: `bash run.sh zero2` (all of the options fail with the same error) > > ``` > Time...

@hwchen2017 just a reminder in case you miss this~ thanks.

> @inkcherry, thanks for the quick PR. I have a few questions > > 1. It seems this PR is a workaround using `reuse_dist_env=False` rather than fixing autotp itself. Is...

hi @Peter-Chou ,I gave it a try and it works correctly. Here's my list of checkpoint files — it looks like yours is missing some content compared to mine. It...

hi @cynricfu ,thanks for the report. This is likely due to the Transformer display logic using ```total_batch_size``` without accounting for ```dp_world_size != world_size``` You can ignore it for now —...

Hi, @hijkzzz, glad to see you're interested in this. This setup is TP-first, for example, with 4 ranks (0,1,2,3) and tp_size=2. So: [0,1] and [2,3] are TP groups, [0,2] and...