Ruihang Lai
Ruihang Lai
LegalizeOps does a step of RemoveAllUnused here. https://github.com/tlc-pack/relax/blob/ce5c7f4117e4fcc33894a1449f751f32f8e1460e/python/tvm/relax/transform/legalize_ops.py#L777 It looks to me that the RemoveAllUnused regards the MatchCast as an unused binding, and thereby removes it. While in fact, the...
Just rebased mlc-ai/relax. Let's trigger the CI tomorrow.
Folks, we have supported Phi-3 mini 4k/128k and you can find the pre-converted models at https://huggingface.co/mlc-ai. Phi-3 is available in the Android app and iOS app now. Closing this issue...
Hi @moondiy @sebastienbo, our prebuilt package should be able to contain this commit tomorrow. We will work on uploading prebuilt model weights to our huggingface and also the android app...
Thank you @marquicus for reporting. Would you mind sharing more backtrace of the error ``` TVMError: Function vm.builtin.paged_attention_kv_cache_create_reduced(0: runtime.ShapeTuple, 1: int64_t, 2: int64_t, 3: int64_t, 4: int64_t, 5: int, 6:...
Just want to put a note here that we've bumped the ROCm support to 6.1/6.2 and you are welcome to try out the prebuilt mlc packages at https://llm.mlc.ai/docs/install/mlc_llm.html#option-1-prebuilt-package
Thank you @ChenYang3024 for reporting and raising this great point. We missed taking the JIT support into consideration when introducing the Paged KV cache and attention support. Right now the...
cc @CharlieFRuan
Thank you @gesanqiu for reporting this finding. My first impression is likely this is a bug and needs a fix. We will discuss and see how we can address this....
Thank you @AleksanderObuchowski for the suggestion! We will update the documentation for that :-)