generative-recommenders icon indicating copy to clipboard operation
generative-recommenders copied to clipboard

Triton is running too slow?

Open bzxc opened this issue 7 months ago • 1 comments

Compared to the same structure(the qkv attention) I implemented with TensorFlow, triton runs 10 to 20 times slower. With the help of nsight system, I found that cudaMemcpySync takes off much time while triton is executing. Would you happen to have any ideas about that?

I feed data like this, batch: 8 seq_len: 8192, where each seq_len are the same size. emb_size = attn_size = linear_size As I changed the data size by a multiplier of 2

On Nvidia A30

bzxc avatar Jul 18 '24 08:07 bzxc