Siyuan Feng
Siyuan Feng
Actually, it performs well on Mac, please see the performance reports at https://github.com/mlc-ai/mlc-llm/issues/15. I guess you are using Mac with Intel integrated GPU. If so, we can do nothing as...
We use GPUs (also Integrated GPUs) to run models. As `Intel UHD Graphics 630` is too weak, we can do nothing to break the hardware limitations
All codes, including model conversion codes, are in this repo. We are going to write tutorials soon
As Relax has native dynamic shape support and modulized compilation flow, we won't waste our time on Relay in this project. And we are constructed by relax, because: 1. We...
Could you try to uninstall and reinstall `mlc-chat-nightly`?
Thanks for your bug reporting. It's just fixed in #316. You can have try after tomorrow's nightly build
Please try RedPajama-INCITE-Chat-3B-v1, which is more friendly to 4G VRAM
Please see the [documents](https://mlc.ai/mlc-llm/docs/tutorials/runtime/cpp.html#id2) here, need to run `gen_cmake_config.py` to detect device
You are right. The CPU is too weak to run LLMs, so we only focus on the GPU environment due to limited bandwidth. On the other hand, it's not hard...
Would be great if you could add testcases :)