XJY990705
XJY990705
Thank you very much!! It is a big help!
I want to use this pr, and I notice the TVM branch it use is f5f048b, but I got some bugs while compiling mlc-llm. Would you please tell me the...
@davidpissarra thank you for your reply, I will try again
I have already solved this problem by this pr https://github.com/tlc-pack/libflash_attn/pull/8 maybe when I swiched to f5f048b branch, this modification is lost. Anyway, thank you for your help!!
@davidpissarra I noticed 3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc is not changed, and mismatched with python/mlc_llm/nn/kv_cache.py when calling TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create_reduced"). Is there any commit you forget?
Sorry to bother you again, I tried your method and I got the exact ppl for the same datasets using int3 quantization, int4 quantization and no quantization. And I want...
> We can consider adding this to our backlog, @liushz. thanks a lot