Junyang Lin
Junyang Lin
This is really a terrific idea! I think we should collect more data for fine-tuning to make things work better.
看着没什么问题,1.8b的超参数稍有不同,check generation_config。我测了q4_k_m的gguf,也是正常输出agent的内容。 ``` FROM TEMPLATE """{{ if .System }}system {{ .System }} {{ end }}{{ if .Prompt }}user {{ .Prompt }} {{ end }}assistant """ SYSTEM """You are a helpful...
Will GPTQ be supported?
I should say it might be quite difficult to incorporate our NTK method with continuous batching I guess. Sorry for the inconvenience...
```python python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --model Qwen1.5-0.5B-Chat --dtype=half ``` using this command can replicate this issue right?
this is a known issue of the models. no good way to solve this problem temporarily until we update the checkpoints.
no eta. still working on it
Yeah it even outcomptes 72B a lot. but as it is sometimes meaningless for these benchmarks to really reflect the quality and as we have confidence in Chinese, we did...
we'll take a look but some strange things may happen at quantized models time to time. not sure if we can solve this, but just advise you to use a...
what do custom embedding data refer to? Please illustrate your problem with more details