Carl Guo

Results 4 issues of Carl Guo

I wrote a minimal int8 quantized Flash Attention implementation that sees a 31% speedup with reasonable accuracy. One big issue hindering its performance is that it can't utilize LDSM for...

I have been trying to play around with the QAT pipeline with LLM text generation. I have adopted the code from `examples/nlp/text-generation/quantize_causal_lm_model.py` and used gpt2-small for my model (but same...

**What is your question?** I would like to use [movmatrix.sync.aligned.shape.trans](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-movmatrix) but there doesn't seem to be a primitive supporting `movmatrix` like `ldmatrix`. Is it just not supported yet by cutlass?...

feature request
help wanted
question