tabby Should support download multiple files model, e.g., qwen2.5

Please describe the feature you want

Tabby will now download gguf model by the URL specified in the model registry, but it only supports one URL per model, the vec is used for selecting one URL by TABBY_DOWNLOAD_HOST.

https://github.com/TabbyML/tabby/blob/ca7895b2f80f81c2b723ab7d4bd1f3fc5edd32fc/crates/tabby-common/src/registry.rs#L10-L19

qwen 2.5 models have multiple files for each model:

https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/tree/main
https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/tree/main

solution

specify the first part url with the standard filename format, e.g., qwen2.5-coder-7b-instruct-q8_0-00001-of-00003.gguf, and we can parse both the index and total number from the filename, then download all parts of the GGUFs.

this is also how the llama-server supported the splited models.

Tabby now save model model file in name model.gguf, as for splited ones, we should also append the suffix -00001-of-00003.gguf

Additional context Add any other context or screenshots about the feature request here.

Please reply with a 👍 if you want this feature.

Sep 21 '24 09:09 zwpaper

Qwen 2.5 Coder 7B outperforms many model there, so yes this is a must have.

Sep 23 '24 14:09 katopz

I can use ollama with this data/config.toml

[model.completion.http]
kind = "ollama/completion"
model_name = "qwen2.5-coder:7b-base"
api_endpoint = "YOUR_ENDPOINT"
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"

[model.chat.http]
kind = "openai/chat"
model_name = "qwen2.5-coder"
api_endpoint = "YOUR_ENDPOINT/v1"
api_key = "dummy"

completion but it return <|endoftext|>, don't know it is ollama or tabby problem.

Sep 24 '24 02:09 vpckso

@zwpaper I can work on this issue. But, I think downloading through inference might lead to problems, as we can't know if all users are making such inferences. I believe it would be better for us to download by using regular expressions or specifying all the files. What do you think?

Sep 28 '24 05:09 umialpha

fixed in 0.19

Oct 31 '24 16:10 wsxiaoys