Aleksandar Samardžić

Results 60 comments of Aleksandar Samardžić

Here is mine: [mixed_input.gemm.csv](https://github.com/user-attachments/files/16183579/mixed_input.gemm.csv) - all fine here. Not sure what could be causing the difference... I tested on A100, the only difference may be that I re-based my branch...

> Checking the status on this reviewed PR. If this is already merged? It doesn't seem to be merged yet.

FWIW, [here](https://github.com/pytorch/pytorch/pull/107782) is an update of mixed datatypes GEMM related CUTLASS extensions ported to CUTLASS 3.1.

More to come here: support for `U4`, support for generator in the CUTLASS library, etc. Still, opening PR to solicit feedback for `S8`/`S4` and `S4/S8` GEMMs that are now available;...

Added generator support for S8/S4 and S4/S8. --- AFAIK, implementing generator support for given operation is not specifically documented, so I want to clarify the steps I've taken here. Basically,...

> Hi @alexsamardzic, thanks for working on this. Just wanted to clarify, will this kernel support int4 grouped per channel weight quantization + int8 per token dynamic activation quantization? This...

@manishucsd, @hwu36: Would it be possible for someone to review this PR (and eventually #1350 too)? These should not be controversial, are needed by PyTorch, and for this one I'd...

> How can I integrate this PR with PyTorch? Are there any example codes available ? @alexsamardzic The primary motivation for this PR is to have this combination of operands...

> I'm a beginner with Cutlass, I have on idea how to use my own constructed s4/s8 data to run this GEMM. Could you please provide an example code for...