inkcherry
inkcherry
> @inkcherry Is there a link to the demo code? I'm interested in the potential use case of this feature proposal. hi, @delock FYI:https://github.com/inkcherry/stanford_alpaca/tree/tp_demo see the latest commit msg Due...
> @inkcherry, thanks for this PR. Are you able to provide some observed memory and latency benefits? Hi, @tjruwase , I use a setup of 4xA800 80G with PyTorch version...
> I'm unable to get this to work. > > First I run: `bash run.sh zero2` (all of the options fail with the same error) > > ``` > Time...
@hwchen2017 just a reminder in case you miss this~ thanks.
> @inkcherry, thanks for the quick PR. I have a few questions > > 1. It seems this PR is a workaround using `reuse_dist_env=False` rather than fixing autotp itself. Is...
FYI @delock @Yejing-Lai
could you try with ```replace_with_kernel_inject=False``` ?
hi @Peter-Chou ,I gave it a try and it works correctly. Here's my list of checkpoint files — it looks like yours is missing some content compared to mine. It...
hi @cynricfu ,thanks for the report. This is likely due to the Transformer display logic using ```total_batch_size``` without accounting for ```dp_world_size != world_size``` You can ignore it for now —...
Hi, @hijkzzz, glad to see you're interested in this. This setup is TP-first, for example, with 4 ranks (0,1,2,3) and tp_size=2. So: [0,1] and [2,3] are TP groups, [0,2] and...