Max Caldwell

Results 8 comments of Max Caldwell

I got this working as well! Inference time seems to increase more than linearly with prompt size - 3 seconds of audio: 10 seconds of generation - 8s of audio:...

+1 agreed, but In the CLI lib here: https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/utils.py You can see some arguments available which might work like you're asking for. There are some models available that if you...

@junrushao how can we find tokens/sec? I'd say 'quite fast' fastest LLM I've run on this 2020 MacBook Pro M1 8G. 10x faster than your WebGPU demo running with less...

Killer, I'm at encode: 31.9 tok/s, decode: 11.4 tok/s for 2020 MacBook Pro M1 8G with the default vicuna 6b. For reference my decode on the WebGPU demo is like,...

Confirmed this is also happening for the new Hermes Pro model with many different variations of this template. TEMPLATE """{{ if .System }}system {{ .System }} {{ end }}{{ if...

Update from @mchiang0610 — all of our files need these for ChatML PARAMETER stop PARAMETER stop @olafgeibig try yours without the quotation marks?

Is there any more information about what's needed to author a `convert.py` for a given model? I'm seeing a lot of similarities between them in terms of loading the weights...