David Pissarra issues

Repositories
Issues
Comments

Results 3 issues of


                                            David Pissarra

[Serving] PagedKVCache Quantization

KV Cache might be a burden under tight memory constraints, and cache quantization can reduce its memory requirement by roughly 75% (float16 -> int3 KV cache). As a result, this...

[KVCache] PagedKVCache Quantization

PR supporting https://github.com/mlc-ai/mlc-llm/pull/2663.

[QST] possible performance bug due to disabling inlining for ldmatrix

I recently noticed that all inlining for `ldmatrix` instructions on [`ldsm.h`](https://github.com/tile-ai/tilelang/blob/409ab83d6e74d177d2e178166328efb6a43bec25/src/tl_templates/cuda/ldsm.h#L7) was disabled on this pr: https://github.com/tile-ai/tilelang/pull/227. Whenever ldmatrix on ldsm.h is invoked, nvcc is not able to keep matrices...