David Pissarra
David Pissarra
Hi @XJY990705 , it is still actually on f5f048b. You may be able to run it if you build everything from this branch (including tvm). I will rebase it in...
> Is this still being worked on, or is there already a kv cache quantization implemented? Hi @kazunator. It is implemented indeed. Since it hasn't been merged yet, you should...
@kazunator you should be able to run it by following the typical MLC model compilation flow (the following steps should be enough). For more details, feel free refer to https://llm.mlc.ai/docs/compilation/compile_models.html...