Tianqi Chen

Results 637 comments of Tianqi Chen

After reading a bit more, I think easiest way would to to interface the [TVM C API](https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h) and work through https://dart.dev/guides/libraries/c-interop. The rust module might serve as a reference https://github.com/mlc-ai/mlc-llm/pull/1213....

feel free to convert the model ourselves

This is now fixed in latest apk

likely we need to further restrict the group sizes and these device memory are too small for the llama style models

we do depend on cmake to support android, we recommend conda environment to enable such cases

latest android sdk might help address related issues https://llm.mlc.ai/docs/deploy/android.html

Thanks for the suggestions. Indeed we recently are moving towards encouraging JIT compile to simplify our flow. Please checkout some of the latest tutorials https://llm.mlc.ai/docs/get_started/introduction.html

This was due to the prefill_chunk_size setting, reduce it would help the issue