mlc-llm
mlc-llm copied to clipboard
Multi-GPU support for larger-than-VRAM models
Awesome project, thanks!
Does it support sharding large models across multiple GPUs, or would this be in scope for this project in the future?
thank you for your suggestion, yes we love to support the needs of the community and bring it in our roadmap
llama.cpp seems to support this now fyi!
Multi gpu now lands in mlc