FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

question about GPT-3's computational complexity

Open mmdzzh opened this issue 4 years ago • 3 comments

I try to calculate it and find it only about 126tflop if batch_size is 1, I'm not sure whether I get a wrong result, it looks too small.

I just calculate flop of multi-head attention and ffn because I think they are main calculate operator.

mmdzzh avatar Sep 04 '21 05:09 mmdzzh

I don't know what input/output you use, what computing cost you expect, and what you want to ask.

byshiue avatar Sep 04 '21 13:09 byshiue

I don't know what input/output you use, what computing cost you expect, and what you want to ask.

input len is 512 and output len is 32, just as default.

mmdzzh avatar Sep 04 '21 13:09 mmdzzh

I don't know what input/output you use, what computing cost you expect, and what you want to ask.

I've calculated again and get the result that only 170 tflop.

mmdzzh avatar Sep 04 '21 13:09 mmdzzh