Carl Guo issues

Results 4 issues of


                                            Carl Guo

Init commit for sql dialect

[QST] WIP int8 quantized impl but can't get transposed LDSM to work with current layout

I wrote a minimal int8 quantized Flash Attention implementation that sees a 31% speedup with reasonable accuracy. One big issue hindering its performance is that it can't utilize LDSM for...

LLM forward pass doesn't work before freezing (matmul dtype mismatch)

I have been trying to play around with the QAT pipeline with LLM text generation. I have adopted the code from `examples/nlp/text-generation/quantize_causal_lm_model.py` and used gpt2-small for my model (but same...

[QST] Support for movmatrix

**What is your question?** I would like to use [movmatrix.sync.aligned.shape.trans](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-movmatrix) but there doesn't seem to be a primitive supporting `movmatrix` like `ldmatrix`. Is it just not supported yet by cutlass?...

feature request

help wanted

question