punica icon indicating copy to clipboard operation
punica copied to clipboard

Reasons for switching to CUTLASS-based kernel instead of custom kernel

Open Yard1 opened this issue 1 year ago • 2 comments

Hey folks, awesome and really impactful work with the repo and the paper.

I was wondering what was the reason for switching from the original bgmv kernel to a CUTLASS-based sgmv one. I understand that one advantage of sgmv is that it doesn't require the LoRA tensors to be in a single contiguous block of memory, but aside from that, are there any performance considerations that made you switch?

I can also see that there is a custom sgmv shrink kernel implementation but the expand version is WIP. Is that something you are planning to work on in the near future?

Furthermore, do the performance results in the paper concern the CUTLASS kernel or the custom kernel? From the description of the implementation I inferred the later, but I was confused by the lack of the custom expand kernel in the repo.

Thanks, and great work!

Yard1 avatar Nov 09 '23 18:11 Yard1