XJY990705 comments

Results 7 comments of


                                            XJY990705

[Feature Request] Bert Model support

Thank you very much!! It is a big help!

[Serving] PagedKVCache Quantization

I want to use this pr, and I notice the TVM branch it use is f5f048b, but I got some bugs while compiling mlc-llm. Would you please tell me the...

[Serving] PagedKVCache Quantization

@davidpissarra thank you for your reply, I will try again

[Serving] PagedKVCache Quantization

I have already solved this problem by this pr https://github.com/tlc-pack/libflash_attn/pull/8 maybe when I swiched to f5f048b branch, this modification is lost. Anyway, thank you for your help!!

[Serving] PagedKVCache Quantization

@davidpissarra I noticed 3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc is not changed, and mismatched with python/mlc_llm/nn/kv_cache.py when calling TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create_reduced"). Is there any commit you forget?

[Serving] PagedKVCache Quantization

Sorry to bother you again, I tried your method and I got the exact ppl for the same datasets using int3 quantization, int4 quantization and no quantization. And I want...

[Feature] 添加mlc-llm后端支持

> We can consider adding this to our backlog, @liushz. thanks a lot