Storm

Results 24 comments of Storm

> Repeat is kind of normal in LLM models. Here is some possible solution: > > 1. Try to use do_sample=True in generate api ? > 2. change woq_config args...

> 2024-05-21 16:45:22 GPU-99 dbgpt.storage.vector_store.connector[3350824] INFO VectorStore:2024-05-21 16:45:22 GPU-99 dbgpt.storage.vector_store连接器[3350824]信息向量存储:

+1,可以整两套函数呀,run, arun,目前很多工具也都是这样做的,用起来会比较方便。同步的生产环境还是不太友好

> Hi, how did you start that script? Did you change the script to set the temperature? https://github.com/vllm-project/vllm/blob/main/benchmarks/backend_request_func.py#L321 修改这一行的温度值。直接python run这个benchmark_serving.py,修改一下args里面的地址,模型啥的

> Hi, if you have modified the script and would like to receive coherent responses, you probably also want to modify the reptition penalty, stop tokens, endpoint (use chat completions)...

> Hi, I had tested the whole thing before my first comment. but, Have you not experienced any duplication? I see many buddies, just like me, experiencing repetition. What's going...

[vllm_benchmark.zip](https://github.com/user-attachments/files/15912331/vllm_benchmark.zip) @jklj077

> Hi, the files you provided were heavily modified and a lot of things were hard-coded. After the backend kept giving me Bad Request, I just gave up. > >...

@ZhaoqiongZ My machine has 4 sockets. Is this not supported? I've tried that before `numactl -C 0-63 -m 0 python run.py --benchmark -m /root/models/baichuan-inc/Baichuan2-13B-Chat --dtype bfloat16 --ipex --token-latency --num-iter 1...

@ZhaoqiongZ Thanks! I would like to ask what you mean by multiple-instances? Multiple machines? Is it impossible for someone like me who only has 1 machine and 4 sockets to...