Tianqi Chen
Tianqi Chen
let us also confirm if it is the case for JSONFFIEngine
Just to followup on the case of JSONFFIEngine. The main purpose of JSONFFIEngine is that we should avoid passing in object and parsing mlc-chat-config from FFI side. so the current...
@MasterJH5574 would be good to confirm the state of this issue now in JSONFFI
the latest MLCEngine should support concurrent generation and read config ones, see #2217
KV cache is a common interface, the solution right now would be to create a difference instance of kv cache implementation of the same interfaceand replace it
This is something ideally we would like to enable, and indeed we need to overcome some of the hurdles mentions. We can keep this issue open to see the status,...
Thanks for reporting. As a temp measure. Reduce the prefill chunk size might help. We should followup by auto limit this number when we run gen config
@ahz-r3v you might need to cross check if you have recompiled the lib
closing as the delivery flow now lands
added https://github.com/mlc-ai/mlc-llm/pull/2445