Yonghao Zhuang
Yonghao Zhuang
you can just use var. It wraps such an id
it depends on your algo. I think the **first principle is to not increase the total comm size**. E.g. if originally we send 0>2, I cannot see any advantage in...
The heuristic works for the first scene. In the above (0>1 & 0>2) case, we don't need to add it in 1's outvars. You can read the PipelineInstEmitter for more...
A var corresponds to a logical tensor including all its shards. In the "pipeline pass", we only decide how the computational graph is divided into pipeline stages, but not the...
This seems an OOM error. If the model is not very large, maybe it is related to [this issue](https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html). For example, the XLA from jaxlib will try to get 90%...
Actually WIP with the bug: https://github.com/alpa-projects/alpa/pull/807
@cksmll Could you please try the nightly alpa after https://github.com/alpa-projects/alpa/pull/807?
It seems like another bug...I'll try to fix it.
@cksmll In alpa, we monkey patches the rng from jax's stateless version to tf's stateful one, and jax uses a specific dtype for its rng, which is not handled. As...
cc @jiaodong