Siyuan Feng

Results 99 comments of Siyuan Feng

I think it should have been supported. cc @vinx13

compile and convert does not require reading all data into RAM/VRAM. In other words, 500GB (original weight + converted weight) disk space is enough, no specific RAM or VRAM requirements....

`ndarray-cache.json` exists on [huggingface](https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC/blob/main/ndarray-cache.json) please try remove the model and download again

I think it should be an easy enhancement. cc @junrushao

Octopus is based on Gemma arch, so mlc-llm should work well

It is based on Mixtral, so MLC-LLM should work well :) We can upload the weight if possible

I think it should work if you turn off flashinfer and cutlass support. However, we do not have resource to optimize for such old device.

Please specify `--context-window-size` for Qwen 1.5. BTW I just ran it a few days ago, it works ![151711345458_ pic](https://github.com/mlc-ai/mlc-llm/assets/25500082/475141fd-ae3d-4687-8c2e-41dfee6ba115)