Hao Zhang issues

Results 19 issues of


                                            Hao Zhang

[Roadmap] Inference performance (for OPT/GPT)

In order to achieve state-of-the-art serving performance on OPT/GPT, we need to develop the following features, sorted with priority. ## Task 1: Align single-GPU decoding performance with FasterTransformer. ### Task...

enhancement

[PERF] Port the manually optimized CUDA kernels into Alpa

It is worth porting the manually optimized fused CUDA kernel from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/fused_kernels) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) They seem to increase 3- 8 TFLOPS on GPT-3 based on my observations. To do this,...

enhancement

good first issue

Hao Zhang

[Roadmap] Inference performance (for OPT/GPT)

[PERF] Port the manually optimized CUDA kernels into Alpa

[WIP] Experimental AutoStrategy

Add Pipeline parallelism strategy to AutoDist

Improve graph transformation performance on large graphs

Decouple compression semantic from Allreduce synchronizer

Support TensorFlow 2.3.0

Add Poseidon strategy

iteration-level scheduler

[FEATURE] OPT-175B service authentication and new priority queue