Liu-congo
Results
2
comments of
Liu-congo
hello, i'd like to try implementing it if possible
emm, well i guess it caused by the sync between blocks(a similar error shown in [here](https://forums.developer.nvidia.com/t/problem-of-distributed-shared-memory/331150) maybe you should modify line 168 in https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_reduction.hpp replace "__syncthreads();" as "auto grid =...