Siyuan Feng
Siyuan Feng
I think it should have been supported. cc @vinx13
compile and convert does not require reading all data into RAM/VRAM. In other words, 500GB (original weight + converted weight) disk space is enough, no specific RAM or VRAM requirements....
`ndarray-cache.json` exists on [huggingface](https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC/blob/main/ndarray-cache.json) please try remove the model and download again
Which Soc do you use? And could you please check if it works on mobile phones?
I think it should be an easy enhancement. cc @junrushao
Octopus is based on Gemma arch, so mlc-llm should work well
It is based on Mixtral, so MLC-LLM should work well :) We can upload the weight if possible
I think it should work if you turn off flashinfer and cutlass support. However, we do not have resource to optimize for such old device.
Please specify `--context-window-size` for Qwen 1.5. BTW I just ran it a few days ago, it works 