Junru Shao comments

Results 179 comments of


                                            Junru Shao

tvm::runtime::InternalError relax/src/runtime/relax_vm/lm_support.cc:247 Check failed: uniform_sample <= data[0].first (0.0715982 vs. nan)

It occurs only when the Metal binary is not properly build. Would you like to double check?

tvm::runtime::InternalError relax/src/runtime/relax_vm/lm_support.cc:247 Check failed: uniform_sample <= data[0].first (0.0715982 vs. nan)

We've fixed several related issues in the recent month, but could you guys double check if the issue persists? If so, please open a new issue with detailed information so...

Is tunning scripts available？

We are not using `tune_relax` because it only supports static shape workloads. Will release a tutorial soon

Seriously what was that?

I'm not sure, but Vicuna-7b as a language model it definitely suffers from potential hallucination

Dose mlc-llm support parallelism like multi-gpu, multi-node ?

At this moment, this project focuses on single consumer-class GPU, making it possible for everyone to run on their own laptops and phones. We will bring in distributed inference later

run `dolly-v2-3b` failed

We haven't announced Dolly yet, but it should work out of the box as of today. On the issue you reported, I met it once when I didn't compile Metal...

run `dolly-v2-3b` failed

This should work properly on latest HEAD. Please feel free to open new issues if the problem persists :-)

Do you have plans to support the Android platform？

Yep of course :-)

Do you have plans to support the Android platform？

It’s up! https://twitter.com/bohanhou1998/status/1655772690760994818

Hey I made a docker image that may help benchmark MLC LLM performance: https://github.com/junrushao/llm-perf-bench On the other hand, I don’t really think docker is a perfect abstraction for those usecases...