David Pissarra comments

Repositories
Issues
Comments

Results 3 comments of


                                            David Pissarra

[Serving] PagedKVCache Quantization

Hi @XJY990705 , it is still actually on f5f048b. You may be able to run it if you build everything from this branch (including tvm). I will rebase it in...

[Serving] PagedKVCache Quantization

> Is this still being worked on, or is there already a kv cache quantization implemented? Hi @kazunator. It is implemented indeed. Since it hasn't been merged yet, you should...

[Serving] PagedKVCache Quantization

@kazunator you should be able to run it by following the typical MLC model compilation flow (the following steps should be enough). For more details, feel free refer to https://llm.mlc.ai/docs/compilation/compile_models.html...