Raja Gond comments

Repositories
Issues
Comments

Results 14 comments of


                                            Raja Gond

Regarding GEMV.AG and O.AG

@serendipity-zk

Intra-node compute-communication overlap without SM cores

Thanks for the reply. Additionally, why are you not doing that for prefill? Also, since decoding is memory-bound, wouldn't breaking it into two or more microbatches be inefficient?

Intra-node compute-communication overlap without SM cores

That makes sense. Deepseek-v3/R1 is large, so 256 seems sufficient.

Question on SM Restriction and Latency Behavior

Thanks for the reply. Yeah, it’s an H100 PCIe box. However, I haven’t optimized the Triton kernel yet. One more question: when you’re running prefill and decode in parallel using...