Storm comments

Results 24 comments of


                                            Storm

Qwen-14B-Chat inference repeat

> Repeat is kind of normal in LLM models. Here is some possible solution: > > 1. Try to use do_sample=True in generate api ? > 2. change woq_config args...

How to connect DB-GPT to local MYSQL?I got Errno 111 even when the docker containers run well.

> 2024-05-21 16:45:22 GPU-99 dbgpt.storage.vector_store.connector[3350824] INFO VectorStore:2024-05-21 16:45:22 GPU-99 dbgpt.storage.vector_store连接器[3350824]信息向量存储：

增加异步调用支持

+1，可以整两套函数呀，run, arun，目前很多工具也都是这样做的，用起来会比较方便。同步的生产环境还是不太友好

Qwen2-72B-Instruct-gptq-int4重复问题

> Hi, how did you start that script? Did you change the script to set the temperature? https://github.com/vllm-project/vllm/blob/main/benchmarks/backend_request_func.py#L321 修改这一行的温度值。直接python run这个benchmark_serving.py，修改一下args里面的地址，模型啥的

Qwen2-72B-Instruct-gptq-int4重复问题

> Hi, if you have modified the script and would like to receive coherent responses, you probably also want to modify the reptition penalty, stop tokens, endpoint (use chat completions)...

Qwen2-72B-Instruct-gptq-int4重复问题

> Hi, I had tested the whole thing before my first comment. but, Have you not experienced any duplication? I see many buddies, just like me, experiencing repetition. What's going...

Qwen2-72B-Instruct-gptq-int4重复问题

[vllm_benchmark.zip](https://github.com/user-attachments/files/15912331/vllm_benchmark.zip) @jklj077

Qwen2-72B-Instruct-gptq-int4重复问题

> Hi, the files you provided were heavily modified and a lot of things were hard-coded. After the backend kept giving me Bad Request, I just gave up. > >...

how to set command to require best performence?

@ZhaoqiongZ My machine has 4 sockets. Is this not supported? I've tried that before `numactl -C 0-63 -m 0 python run.py --benchmark -m /root/models/baichuan-inc/Baichuan2-13B-Chat --dtype bfloat16 --ipex --token-latency --num-iter 1...

how to set command to require best performence?

@ZhaoqiongZ Thanks! I would like to ask what you mean by multiple-instances? Multiple machines? Is it impossible for someone like me who only has 1 machine and 4 sockets to...