FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

Question about GPT-3 performance

Open hawkwm opened this issue 3 years ago • 3 comments

FasterTransformer is great job. I want to ask some question about the GPT-3 benchmark.

  1. How are these performance of GPT-3 (175B, 89B, 20B, 6.7B,1.3B...) tested? Is it based on the real models? Which scripts was used?

  2. How to test Megatron performance? Are the test scripts available to users?

hawkwm avatar May 16 '22 03:05 hawkwm

  1. It is tested by random model weight. We launch the triton server and measure the time of queries with different batch size and sequence lengths. More details about setting triton server are in https://github.com/triton-inference-server/fastertransformer_backend. You can get close results by examples of this repo directly.
  2. You can refer and ask in Megatron-LM repo. We modified the evaluation scripts provided in Megatron-LM at the time. But we don't have these scripts in FT repo now.

byshiue avatar May 16 '22 03:05 byshiue

About 1, if just test on one machine, still need to use Triton? Is there any difference between performance tested by Triton and by scripts https://github.com/NVIDIA/FasterTransformer/blob/main/examples/cpp/multi_gpu_gpt/multi_gpu_gpt_example.cc ?

hawkwm avatar May 16 '22 04:05 hawkwm

The example is only tested on c directly. It does not contains the overhead of send/recv for serving.

byshiue avatar May 16 '22 04:05 byshiue

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

byshiue avatar Sep 06 '22 01:09 byshiue