Failed to run benchmark scripts against the endpoint
🐛 Describe the bug
python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200 --random-input-len 2048 --random-output-len 200 --dataset-name random --ignore-eos
Starting initial single prompt test run...
RequestFuncOutput(generated_text='', success=False, latency=0.0, output_tokens=0, ttft=0.0, itl=[], tpot=0.0, prompt_len=2048, error='Bad Request')
Traceback (most recent call last):
File "/Users/bytedance/workspace/vllm/benchmarks/benchmark_serving.py", line 1315, in <module>
main(args)
File "/Users/bytedance/workspace/vllm/benchmarks/benchmark_serving.py", line 951, in main
benchmark_result = asyncio.run(
File "/Users/bytedance/.pyenv/versions/3.10.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/bytedance/.pyenv/versions/3.10.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/Users/bytedance/workspace/vllm/benchmarks/benchmark_serving.py", line 602, in benchmark
raise ValueError(
ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Bad Request
gateway logs
I0303 00:53:49.475583 1 gateway.go:221]
I0303 00:53:49.475604 1 gateway.go:222] "-- In RequestHeaders processing ..." requestID="4cb7758a-aa7c-49e5-a6d0-8243aba62a19"
I0303 00:53:49.475949 1 gateway.go:287] "-- In RequestBody processing ..." requestID="4cb7758a-aa7c-49e5-a6d0-8243aba62a19"
I0303 00:53:49.476224 1 gateway.go:388] "request start" requestID="4cb7758a-aa7c-49e5-a6d0-8243aba62a19" model="deepseek-r1-671b" routingStrategy="random" targetPodIP="192.168.0.74:8000"
I0303 00:53:49.477602 1 gateway.go:407] "-- In ResponseHeaders processing ..." requestID="4cb7758a-aa7c-49e5-a6d0-8243aba62a19"
I0303 00:53:49.477827 1 gateway.go:440] "-- In ResponseBody processing ..." requestID="4cb7758a-aa7c-49e5-a6d0-8243aba62a19" endOfSteam=false
I0303 00:53:49.477858 1 gateway.go:440] "-- In ResponseBody processing ..." requestID="4cb7758a-aa7c-49e5-a6d0-8243aba62a19" endOfSteam=false
I0303 00:53:49.477869 1 gateway.go:440] "-- In ResponseBody processing ..." requestID="4cb7758a-aa7c-49e5-a6d0-8243aba62a19" endOfSteam=true
192.168.0.74 is the head pod but not request is coming into engine side. could be streaming issue?
Steps to Reproduce
deepseek-r1.yaml
Expected behavior
benchmark should work as expected
Environment
0.2.0
Might be related to #757. I discovered that issue when using SGLang's bench_serving, which should be quite similar to vLLM benchmark_serving
🐛 Describe the bug
python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200 --random-input-len 2048 --random-output-len 200 --dataset-name random --ignore-eos
Using this script, I am able to run the benchmark. It requires the fix in PR #794 and for our local cluster since there is no httproute, I had to use routing-strategy to circumvent it.
@varungup90 I did make above change but seems it's not working. your scripts is exact same as the one I attached in the issue, right?
Just notice your test against https://github.com/vllm-project/aibrix/pull/794. I left some comments, I think we should completely remove these validation from gateway side and just need to better handle the error response from engines. We should not make any changes in gateway that may break openAI API compatibility. Recently, we have a few cases like this, like streaming, error response handling, and arg validation etc.
I think we should completely remove these validation from gateway side
This is not the validation, it reads message for the prefix cache. One alternative is to only call getRequestMessage for prefix-cache routing strategy, but it will still need same fix.
@varungup90 did your test against the deepseek model? I used latest release-0.2 version and the problem still exist and I tried both httpRoute and custom strategy.
INFO 03-09 11:49:41 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
Namespace(backend='vllm', base_url='http://localhost:8888', host='127.0.0.1', port=8000, endpoint='/v1/completions', dataset_name='random', dataset_path=None, max_concurrency=200, model='deepseek-ai/deepseek-r1', tokenizer=None, best_of=1, use_beam_search=False, num_prompts=100, logprobs=None, request_rate=2.0, burstiness=1.0, seed=0, trust_remote_code=True, disable_tqdm=False, profile=False, save_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=True, percentile_metrics='ttft,tpot,itl', metric_percentiles='50,90,95,99', goodput=['ttft:1000', 'tpot:100'], sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=2048, random_output_len=200, random_range_ratio=1.0, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, tokenizer_mode='auto', served_model_name='deepseek-r1-671b', lora_modules=None)
Starting initial single prompt test run...
Traceback (most recent call last):
File "/Users/bytedance/workspace/vllm/benchmarks/benchmark_serving.py", line 1314, in <module>
main(args)
File "/Users/bytedance/workspace/vllm/benchmarks/benchmark_serving.py", line 950, in main
benchmark_result = asyncio.run(
File "/Users/bytedance/.pyenv/versions/3.10.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/bytedance/.pyenv/versions/3.10.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/Users/bytedance/workspace/vllm/benchmarks/benchmark_serving.py", line 601, in benchmark
raise ValueError(
ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Never received a valid chunk to calculate TTFT.This response will be marked as failed!
Varun found it's more like data issue. I change from random to shareGPT dataset --dataset-name sharegpt --dataset-path /Users/bytedance/Downloads/ShareGPT_V3_unfiltered_cleaned_split.json
Note:
--dataset-name sonnet --dataset-path /Users/bytedance/workspace/vllm/benchmarks/sonnet.txtworks as well
dataset is here
https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V3_unfiltered_cleaned_split.json
full command is here
python benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200 --random-input-len 2048 --random-output-len 200 --dataset-name sharegpt --dataset-path /Users/bytedance/Downloads/ShareGPT_V3_unfiltered_cleaned_split.json --ignore-eos
Currently, at least we can prove previous code fix works fine (streaming + completion api). for the special character issues, we need to do more investigation in https://github.com/vllm-project/aibrix/issues/832, let's track it separately
This is fixed now.