tianyu-l

Results 33 comments of tianyu-l

@wconstab is going to help set up an example multi-gpu unit test. We can add more functionality unit test following the example.

update: Hit IMA issues for both my implementation #296 and @wconstab's #303. Working on debugging with @lessw2020 .

> Why is a context manager needed (and why do we need to register special replacements for the operators in dispatch table) -- why can't we just add behaviors into...

A solution is proposed in https://github.com/pytorch/pytorch/issues/130646

This is under planning and will be worked on soon. Any suggestions are welcome.

closing as #268 landed -- we are using per-TransformerBlock compilation.

closing this as we have supported this fused RMSNorm in Tensor Parallelism (#404).

3D parallel enabled in #344

Thanks for creating this issue! In fact we recently started working on it, in #279.