Tianqi Chen
Tianqi Chen
After reading a bit more, I think easiest way would to to interface the [TVM C API](https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h) and work through https://dart.dev/guides/libraries/c-interop. The rust module might serve as a reference https://github.com/mlc-ai/mlc-llm/pull/1213....
feel free to convert the model ourselves
This is now fixed in latest apk
likely we need to further restrict the group sizes and these device memory are too small for the llama style models
we do depend on cmake to support android, we recommend conda environment to enable such cases
latest android sdk might help address related issues https://llm.mlc.ai/docs/deploy/android.html
Thanks for the suggestions. Indeed we recently are moving towards encouraging JIT compile to simplify our flow. Please checkout some of the latest tutorials https://llm.mlc.ai/docs/get_started/introduction.html
#2295 Should address this
please try ouot the latets command in https://llm.mlc.ai/docs/get_started/quick_start.html
This was due to the prefill_chunk_size setting, reduce it would help the issue