Yonghao Zhuang

Results 41 comments of Yonghao Zhuang

you can just use var. It wraps such an id

it depends on your algo. I think the **first principle is to not increase the total comm size**. E.g. if originally we send 0>2, I cannot see any advantage in...

The heuristic works for the first scene. In the above (0>1 & 0>2) case, we don't need to add it in 1's outvars. You can read the PipelineInstEmitter for more...

A var corresponds to a logical tensor including all its shards. In the "pipeline pass", we only decide how the computational graph is divided into pipeline stages, but not the...

This seems an OOM error. If the model is not very large, maybe it is related to [this issue](https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html). For example, the XLA from jaxlib will try to get 90%...

Actually WIP with the bug: https://github.com/alpa-projects/alpa/pull/807

@cksmll Could you please try the nightly alpa after https://github.com/alpa-projects/alpa/pull/807?

@cksmll In alpa, we monkey patches the rng from jax's stateless version to tf's stateful one, and jax uses a specific dtype for its rng, which is not handled. As...