Question about GPT-3 performance
FasterTransformer is great job. I want to ask some question about the GPT-3 benchmark.
-
How are these performance of GPT-3 (175B, 89B, 20B, 6.7B,1.3B...) tested? Is it based on the real models? Which scripts was used?
-
How to test Megatron performance? Are the test scripts available to users?
- It is tested by random model weight. We launch the triton server and measure the time of queries with different batch size and sequence lengths. More details about setting triton server are in https://github.com/triton-inference-server/fastertransformer_backend. You can get close results by examples of this repo directly.
- You can refer and ask in Megatron-LM repo. We modified the evaluation scripts provided in Megatron-LM at the time. But we don't have these scripts in FT repo now.
About 1, if just test on one machine, still need to use Triton? Is there any difference between performance tested by Triton and by scripts https://github.com/NVIDIA/FasterTransformer/blob/main/examples/cpp/multi_gpu_gpt/multi_gpu_gpt_example.cc ?
The example is only tested on c directly. It does not contains the overhead of send/recv for serving.
Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.