Yonghao Zhuang comments

Results 41 comments of


                                            Yonghao Zhuang

[FEATURE] Reduce congestion of sending on one mesh

you can just use var. It wraps such an id

[FEATURE] Reduce congestion of sending on one mesh

it depends on your algo. I think the **first principle is to not increase the total comm size**. E.g. if originally we send 0>2, I cannot see any advantage in...

[FEATURE] Reduce congestion of sending on one mesh

The heuristic works for the first scene. In the above (0>1 & 0>2) case, we don't need to add it in 1's outvars. You can read the PipelineInstEmitter for more...

[FEATURE] Reduce congestion of sending on one mesh

A var corresponds to a logical tensor including all its shards. In the "pipeline pass", we only decide how the computational graph is divided into pipeline stages, but not the...

'a:f32[32,12,512,512] = add_any b c' cannot be delayed in apply_grad.py

This seems an OOM error. If the model is not very large, maybe it is related to [this issue](https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html). For example, the XLA from jaxlib will try to get 90%...

'a:f32[32,12,512,512] = add_any b c' cannot be delayed in apply_grad.py

Actually WIP with the bug: https://github.com/alpa-projects/alpa/pull/807

'a:f32[32,12,512,512] = add_any b c' cannot be delayed in apply_grad.py

@cksmll Could you please try the nightly alpa after https://github.com/alpa-projects/alpa/pull/807?

'a:f32[32,12,512,512] = add_any b c' cannot be delayed in apply_grad.py

It seems like another bug...I'll try to fix it.

'a:f32[32,12,512,512] = add_any b c' cannot be delayed in apply_grad.py

@cksmll In alpa, we monkey patches the rng from jax's stateless version to tf's stateful one, and jax uses a specific dtype for its rng, which is not handled. As...

[BUG] Collective group's rank is incorrect

cc @jiaodong