XJY990705

Results 7 comments of XJY990705

Thank you very much!! It is a big help!

I want to use this pr, and I notice the TVM branch it use is f5f048b, but I got some bugs while compiling mlc-llm. Would you please tell me the...

@davidpissarra thank you for your reply, I will try again

I have already solved this problem by this pr https://github.com/tlc-pack/libflash_attn/pull/8 maybe when I swiched to f5f048b branch, this modification is lost. Anyway, thank you for your help!!

@davidpissarra I noticed 3rdparty/tvm/src/runtime/relax_vm/paged_kv_cache.cc is not changed, and mismatched with python/mlc_llm/nn/kv_cache.py when calling TVM_REGISTER_GLOBAL("vm.builtin.paged_attention_kv_cache_create_reduced"). Is there any commit you forget?

Sorry to bother you again, I tried your method and I got the exact ppl for the same datasets using int3 quantization, int4 quantization and no quantization. And I want...

> We can consider adding this to our backlog, @liushz. thanks a lot