Hao Zhang
Hao Zhang
In order to achieve state-of-the-art serving performance on OPT/GPT, we need to develop the following features, sorted with priority. ## Task 1: Align single-GPU decoding performance with FasterTransformer. ### Task...
It is worth porting the manually optimized fused CUDA kernel from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/fused_kernels) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) They seem to increase 3- 8 TFLOPS on GPT-3 based on my observations. To do this,...
Adding the experimental AutoStrategy that determines the best strategy to use, given a model and resource spec, automatically.
**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** Pipeline parallelism explained: https://arxiv.org/abs/1811.06965 **Will this change the current...
**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** The current graph transformation (in-graph specifically) will be unacceptably...
**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** **Will this change the current API? How?** **Describe alternatives...
**System information** - AutoDist version: v0.6.0 - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** Currently we support TensorFlow
**System information** - AutoDist version: - Are you willing to contribute it (Yes/No): **Describe the new feature and the current behavior/state** **Will this change the current API? How?** **Describe alternatives...
1. Create authentication; the one (alpa team) who deploys the opt-175B service can create API keys and distribute the keys to partner users. Partner users who have those keys can...