question about GPT-3's computational complexity
I try to calculate it and find it only about 126tflop if batch_size is 1, I'm not sure whether I get a wrong result, it looks too small.
I just calculate flop of multi-head attention and ffn because I think they are main calculate operator.
I don't know what input/output you use, what computing cost you expect, and what you want to ask.
I don't know what input/output you use, what computing cost you expect, and what you want to ask.
input len is 512 and output len is 32, just as default.
I don't know what input/output you use, what computing cost you expect, and what you want to ask.
I've calculated again and get the result that only 170 tflop.