Hao Zhang

Results 19 issues of Hao Zhang

In order to achieve state-of-the-art serving performance on OPT/GPT, we need to develop the following features, sorted with priority. ## Task 1: Align single-GPU decoding performance with FasterTransformer. ### Task...

enhancement

It is worth porting the manually optimized fused CUDA kernel from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/fused_kernels) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) They seem to increase 3- 8 TFLOPS on GPT-3 based on my observations. To do this,...

enhancement
good first issue

Adding the experimental AutoStrategy that determines the best strategy to use, given a model and resource spec, automatically.

**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** Pipeline parallelism explained: https://arxiv.org/abs/1811.06965 **Will this change the current...

**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** The current graph transformation (in-graph specifically) will be unacceptably...

**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** **Will this change the current API? How?** **Describe alternatives...

**System information** - AutoDist version: v0.6.0 - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** Currently we support TensorFlow

**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** **Will this change the current API? How?** **Describe alternatives...

1. Create authentication; the one (alpa team) who deploys the opt-175B service can create API keys and distribute the keys to partner users. Partner users who have those keys can...