David Pissarra

Results 3 issues of David Pissarra

KV Cache might be a burden under tight memory constraints, and cache quantization can reduce its memory requirement by roughly 75% (float16 -> int3 KV cache). As a result, this...

PR supporting https://github.com/mlc-ai/mlc-llm/pull/2663.

I recently noticed that all inlining for `ldmatrix` instructions on [`ldsm.h`](https://github.com/tile-ai/tilelang/blob/409ab83d6e74d177d2e178166328efb6a43bec25/src/tl_templates/cuda/ldsm.h#L7) was disabled on this pr: https://github.com/tile-ai/tilelang/pull/227. Whenever ldmatrix on ldsm.h is invoked, nvcc is not able to keep matrices...