Deep dive into the Megakernel approach since it's well aligned with OCANNL design

Open lukstafi opened this issue 7 months ago • 4 comments

They implement an interpreter on the GPU, maybe we can avoid that yet still use their solutions for within-kernel synchronization. Or maybe we can go the interpreter route, to be decided.

https://hazyresearch.stanford.edu/blog/2025-05-27-no-bubbles?s=08

May 28 '25 09:05 lukstafi

https://github.com/mirage-project/mirage/tree/mpk https://zhihaojia.medium.com/compiling-llms-into-a-megakernel-a-path-to-low-latency-inference-cf7840913c17 https://x.com/JiaZhihao/status/1935767958963314773

Jun 20 '25 07:06 lukstafi

Out of curiosity, why do you say that the Megakernel approach is aligned with OCANNL's design?

Jul 24 '25 06:07 derekchiang

Because splitting of megakernels into proper kernels is not implemented yet.

Jul 24 '25 07:07 lukstafi

Less tongue-in-cheek: megakernel = routine in OCANNL terminology.

Jul 24 '25 07:07 lukstafi