Junru Shao
Junru Shao
It occurs only when the Metal binary is not properly build. Would you like to double check?
We've fixed several related issues in the recent month, but could you guys double check if the issue persists? If so, please open a new issue with detailed information so...
We are not using `tune_relax` because it only supports static shape workloads. Will release a tutorial soon
I'm not sure, but Vicuna-7b as a language model it definitely suffers from potential hallucination
At this moment, this project focuses on single consumer-class GPU, making it possible for everyone to run on their own laptops and phones. We will bring in distributed inference later
We haven't announced Dolly yet, but it should work out of the box as of today. On the issue you reported, I met it once when I didn't compile Metal...
This should work properly on latest HEAD. Please feel free to open new issues if the problem persists :-)
Yep of course :-)
It’s up! https://twitter.com/bohanhou1998/status/1655772690760994818
Hey I made a docker image that may help benchmark MLC LLM performance: https://github.com/junrushao/llm-perf-bench On the other hand, I don’t really think docker is a perfect abstraction for those usecases...