tabby icon indicating copy to clipboard operation
tabby copied to clipboard

Should support download multiple files model, e.g., qwen2.5

Open zwpaper opened this issue 1 year ago • 3 comments

Please describe the feature you want

Tabby will now download gguf model by the URL specified in the model registry, but it only supports one URL per model, the vec is used for selecting one URL by TABBY_DOWNLOAD_HOST.

https://github.com/TabbyML/tabby/blob/ca7895b2f80f81c2b723ab7d4bd1f3fc5edd32fc/crates/tabby-common/src/registry.rs#L10-L19

qwen 2.5 models have multiple files for each model:

  • https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/tree/main
  • https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/tree/main

solution

specify the first part url with the standard filename format, e.g., qwen2.5-coder-7b-instruct-q8_0-00001-of-00003.gguf, and we can parse both the index and total number from the filename, then download all parts of the GGUFs.

this is also how the llama-server supported the splited models.

Tabby now save model model file in name model.gguf, as for splited ones, we should also append the suffix -00001-of-00003.gguf

Additional context Add any other context or screenshots about the feature request here.


Please reply with a 👍 if you want this feature.

zwpaper avatar Sep 21 '24 09:09 zwpaper

image Qwen 2.5 Coder 7B outperforms many model there, so yes this is a must have.

katopz avatar Sep 23 '24 14:09 katopz

I can use ollama with this data/config.toml

[model.completion.http]
kind = "ollama/completion"
model_name = "qwen2.5-coder:7b-base"
api_endpoint = "YOUR_ENDPOINT"
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"

[model.chat.http]
kind = "openai/chat"
model_name = "qwen2.5-coder"
api_endpoint = "YOUR_ENDPOINT/v1"
api_key = "dummy"

completion but it return <|endoftext|>, don't know it is ollama or tabby problem.

vpckso avatar Sep 24 '24 02:09 vpckso

@zwpaper I can work on this issue. But, I think downloading through inference might lead to problems, as we can't know if all users are making such inferences. I believe it would be better for us to download by using regular expressions or specifying all the files. What do you think?

umialpha avatar Sep 28 '24 05:09 umialpha

fixed in 0.19

wsxiaoys avatar Oct 31 '24 16:10 wsxiaoys