Chen Zhengda

Results 5 comments of Chen Zhengda

I have also encountered the same problem. Do you know how to solve it? @zt991211

Besides the error in p1 = p2 = p3 = (int)timestep - mrope_position_delta_, the current branch produces incorrect results during batch inference. @irexyc

Hi @void-main, I've encountered the same issue with high Triton kernel launch overhead. Could you please share any solutions or workarounds that have worked for you? Thank you!

Hi @void-main, First of all, thank you very much for your suggestions! I have a couple of questions. In my scenario, I have dynamic shaped inputs, so I wonder if...

@sleepwalker2017 If you don't need to debug CUDA codes, you can remove the -G option from CMAKE_CUDA_FLAGS_DEBUG