Volodymyr Kyrylov

Results 26 comments of Volodymyr Kyrylov

Thanks for the submission @loguntsov I currently don't have the bandwidth to test ejabberd 20. Is there a way we could do this with Github Actions?

+1 looking forward to this great feature

Hi! The current workaround is to pad the output to the nearest power of two before scanning. Could you tell more about your use case?

@jeromeku thank you for the kind words! Glad you checked out nanokitchen as well. It would be indeed possible to use Accelerated Scan for Mamba as is, however would work...

I found that @srush has done this exact fusion of the SSM bits into the Triton forward kernel here: https://github.com/srush/annotated-mamba/issues/1#issuecomment-1885866368

There's some discussion about making a reverse `tl.associative_scan` in https://github.com/openai/triton/issues/2930