Liu-congo

Results 2 comments of Liu-congo

hello, i'd like to try implementing it if possible

emm, well i guess it caused by the sync between blocks(a similar error shown in [here](https://forums.developer.nvidia.com/t/problem-of-distributed-shared-memory/331150) maybe you should modify line 168 in https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_reduction.hpp replace "__syncthreads();" as "auto grid =...