Siyuan Feng comments

Results 99 comments of


                                            Siyuan Feng

[Model Request] Mixtral-8x22B-Instruct-v0.1 🙏

I think it should have been supported. cc @vinx13

[Model Request] Mixtral-8x22B-Instruct-v0.1 🙏

compile and convert does not require reading all data into RAM/VRAM. In other words, 500GB (original weight + converted weight) disk space is enough, no specific RAM or VRAM requirements....

[Bug] gemma 2b start chatting error

`ndarray-cache.json` exists on [huggingface](https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC/blob/main/ndarray-cache.json) please try remove the model and download again

org.apache.tvm.Base$TVMError: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE [Bug]

Which Soc do you use? And could you please check if it works on mobile phones?

OpenCL Support when Convert Weight [Feature Request]

I think it should be an easy enhancement. cc @junrushao

[Model Request] Octopus-v2

Octopus is based on Gemma arch, so mlc-llm should work well

[Model Request] Firefunction V1

It is based on Mixtral, so MLC-LLM should work well :) We can upload the weight if possible

[Question] P40 support (best cost effective hardware)

I think it should work if you turn off flashinfer and cutlass support. However, we do not have resource to optimize for such old device.

[Bug] Using Qwen1.5-1.8B-Chat and Qwen1.5-4B-Chat will cause the app to freeze and crash

Please specify `--context-window-size` for Qwen 1.5. BTW I just ran it a few days ago, it works ![151711345458_ pic](https://github.com/mlc-ai/mlc-llm/assets/25500082/475141fd-ae3d-4687-8c2e-41dfee6ba115)