Yonghao Zhuang
Yonghao Zhuang
I'm writing an assembler to generate relocatable object file and using dump function of rv8 to debug(btw, its output format is really nice). The file is little endian. But I've...
In `jax.remat`, constant values and random numbers are generated in the forward part and stored until the backward part. An example is [this](https://gist.github.com/ZYHowell/96e31b8e43ec37a9ddfaac4aa1a559aa). To reduce memory consumption, we remat this...
Each microbatch runs the traced Jaxpr once. However, some parts of the Jaxpr is not related to microbatch. This results in redundant computations and wrong behavior. For example: ```python def...
This can be a starting point to learn `runtime_emitter` and `cross_mesh_resharding`. Background --- In Pipeshard Parallel, when a tensor is required to be received from a mesh, we always chose...
Background --- Alpa initializes collective groups for each cross-mesh communication pair. The call stack to initialize a collective group is: [`create_collective_group`](https://github.com/alpa-projects/alpa/blob/54c585c0e897914d7078d6f0243d12a19d1733f4/alpa/collective/collective.py#L169) or [`init_collective_group`](https://github.com/alpa-projects/alpa/blob/54c585c0e897914d7078d6f0243d12a19d1733f4/alpa/collective/collective.py#L138) from `collective.py` calls: [`create_collective_group`](https://github.com/alpa-projects/alpa/blob/54c585c0e897914d7078d6f0243d12a19d1733f4/alpa/collective/collective.py#L72) of `GroupManager` class...
This can be a starting point to learn `runtime_emitter`. Background --- In Pipeshard Parallel, the final compilation step is to interpret the solution into [a configuration](https://github.com/alpa-projects/alpa/blob/fcd560d58e680b6d3c5098504242b49f527549ee/alpa/pipeline_parallel/runtime_emitter.py#L228-L255) containing all information about...
I'm running the Megatron-LM [BERT example](https://github.com/NVIDIA/Megatron-LM/blob/main/pretrain_bert.py) with Wikipedia data, and observed a loss divergence between TE v1.1 and v1.2. I then debug by fixing the Megatron-LM/torch version, and binary searched...
## Motivation When serving an extremely large model (e.g. Llama 400B), the #GPU might be more than #kv head. This leads to a replication on kv cache, which is troublesome...