FasterTransformer Question about GPT-3 performance

FasterTransformer is great job. I want to ask some question about the GPT-3 benchmark.

How are these performance of GPT-3 (175B, 89B, 20B, 6.7B,1.3B...) tested? Is it based on the real models? Which scripts was used?
How to test Megatron performance? Are the test scripts available to users?

May 16 '22 03:05 hawkwm

It is tested by random model weight. We launch the triton server and measure the time of queries with different batch size and sequence lengths. More details about setting triton server are in https://github.com/triton-inference-server/fastertransformer_backend. You can get close results by examples of this repo directly.
You can refer and ask in Megatron-LM repo. We modified the evaluation scripts provided in Megatron-LM at the time. But we don't have these scripts in FT repo now.

May 16 '22 03:05 byshiue

About 1, if just test on one machine, still need to use Triton? Is there any difference between performance tested by Triton and by scripts https://github.com/NVIDIA/FasterTransformer/blob/main/examples/cpp/multi_gpu_gpt/multi_gpu_gpt_example.cc ?

May 16 '22 04:05 hawkwm

The example is only tested on c directly. It does not contains the overhead of send/recv for serving.

May 16 '22 04:05 byshiue

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

Sep 06 '22 01:09 byshiue