vllm
vllm copied to clipboard
[Feature] Consolidate performance benchmark datasets
Addressing #13351
Benchmark Serving Results
after the change
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 100 | 4.15 | 54541 |
| hf-vision-arena | openai-chat | 100 | 11.06 | 16589 |
| hf | openai-chat | 100 | 9.73 | 1166 |
| sonnet | vllm | 100 | 3.76 | 54541 |
| sharegpt | vllm | 100 | 8.48 | 23260 |
| random | vllm | 100 | 4.96 | 102400 |
| burstgpt | vllm | 100 | 21.84 | 77561 |
before the change
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 100 | 4.01 | 54541 |
| hf-vision-arena | openai-chat | 100 | 11.21 | 16589 |
| hf | openai-chat | 100 | 9.83 | 1166 |
| sonnet | vllm | 100 | 3.75 | 54541 |
| sharegpt | vllm | 100 | 8.42 | 23260 |
| random | vllm | 100 | 4.84 | 102400 |
| burstgpt | vllm | 100 | 21.66 | 77561 |
MODEL_NAME="Qwen/Qwen2-VL-7B-Instruct"
"python3 benchmarks/benchmark_serving.py --backend openai-chat --model ${MODEL_NAME} --endpoint /v1/chat/completions --dataset-name sonnet --dataset-path benchmarks/sonnet.txt --num-prompts ${NUM_PROMPTS}"
"python3 benchmarks/benchmark_serving.py --model ${MODEL_NAME} --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path lmarena-ai/vision-arena-bench-v0.1 --hf-split train --num-prompts ${NUM_PROMPTS} --request-rate 1000 --percentile-metrics ttft,tpot,e2el"
"python3 benchmarks/benchmark_serving.py --model ${MODEL_NAME} --backend openai-chat --endpoint /v1/chat/completions --dataset-name hf --dataset-path lmms-lab/LLaVA-OneVision-Data --hf-split train --hf-subset \"chart2text(cauldron)\" --num-prompts ${NUM_PROMPTS} --request-rate 1000 --percentile-metrics ttft,tpot,e2el"
"python3 benchmarks/benchmark_serving.py --backend vllm --model ${MODEL_NAME} --dataset-name sonnet --dataset-path benchmarks/sonnet.txt --num-prompts ${NUM_PROMPTS}"
"python3 benchmarks/benchmark_serving.py --backend vllm --model ${MODEL_NAME} --dataset-name sharegpt --dataset-path /home/jovyan/data/vllm_benchmark_datasets/ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts ${NUM_PROMPTS}"
"python3 benchmarks/benchmark_serving.py --backend vllm --model ${MODEL_NAME} --dataset-name random --num-prompts ${NUM_PROMPTS}"
"python3 benchmarks/benchmark_serving.py --backend vllm --model ${MODEL_NAME} --dataset-name burstgpt --dataset-path /home/jovyan/data/vllm_benchmark_datasets/BurstGPT_without_fails_2.csv --num-prompts ${NUM_PROMPTS}"
Benchmark Throughput Results
after the change
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 10 | 50.44 | 1513.07 | 1008.71 |
| ShareGPT | 10 | 1.66 | 605.33 | 378.11 |
| sonnet | 10 | 7.62 | 4960.96 | 1142.38 |
| burstgpt | 10 | 2.17 | 2999.05 | 406.72 |
before the change sonnet and burstgpt is not supported
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 10 | 51.13 | 1534.02 | 1022.68 |
| ShareGPT | 10 | 1.66 | 604.19 | 377.39 |
| sonnet | 10 | |||
| burstgpt | 10 |
MODEL="NousResearch/Hermes-3-Llama-3.1-8B"
"VLLM_USE_V1=1 python3 benchmarks/benchmark_throughput.py --model $MODEL --input_len 10 --output_len 20 --dataset-name random --num-prompts $NUM_PROMPTS"
"VLLM_USE_V1=1 python3 benchmarks/benchmark_throughput.py --model $MODEL --dataset /home/jovyan/vllm/ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts $NUM_PROMPTS"
"VLLM_USE_V1=1 python3 benchmarks/benchmark_throughput.py --model $MODEL --dataset-name sonnet --dataset benchmarks/sonnet.txt --num-prompts $NUM_PROMPTS"
"VLLM_USE_V1=1 python3 benchmarks/benchmark_throughput.py --model $MODEL --dataset /home/jovyan/data/vllm_benchmark_datasets/BurstGPT_without_fails_2.csv --dataset-name burstgpt --num-prompts $NUM_PROMPTS"
Benchmark Throughput Results - Image Support
command copied from here #9851 Since the coco dataset is too large, I used random images here.
random_array = np.random.randint(0, 256, (256, 256, 3), dtype=np.uint8)
mm_data["image"] = Image.fromarray(random_array)
after change 1000 request Throughput: 15.47 requests/s, 3304.12 total tokens/s, 3048.79 output tokens/s before change 1000 request Throughput: 14.90 requests/s, 3183.03 total tokens/s, 2937.06 output tokens/s
python benchmarks/benchmark_throughput.py \
--model mistral-community/pixtral-12b \
--max-model-len=8192 \
--dataset sharegpt4v_instruct_gpt4-vision_cap100k.json
LoRA request test
commands are copied from this PR #11267 after the change
| Dataset | Num Prompts | Max Loras | Max Lora Rank | Enable Lora | Async Engine | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|---|---|---|---|
| ShareGPT | 1000 | 1 | 8 | Yes | No | 11.66 | 5610.75 | 2742.90 |
| ShareGPT | 1000 | 4 | 8 | Yes | No | 11.59 | 5575.73 | 2725.78 |
| ShareGPT | 1000 | N/A | N/A | No | Yes | 17.42 | 8383.51 | 4098.41 |
| ShareGPT | 1000 | 1 | 8 | Yes | Yes | 11.50 | 5535.98 | 2706.35 |
| ShareGPT | 1000 | 4 | 8 | Yes | Yes | 11.25 | 5412.76 | 2646.11 |
before the change
| Dataset | Num Prompts | Max Loras | Max Lora Rank | Enable Lora | Async Engine | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|---|---|---|---|
| ShareGPT | 1000 | 1 | 8 | Yes | No | 10.84 | 5216.17 | 2550.01 |
| ShareGPT | 1000 | 4 | 8 | Yes | No | 10.80 | 5197.68 | 2540.97 |
| ShareGPT | 1000 | N/A | N/A | No | Yes | 16.75 | 8061.23 | 3940.86 |
| ShareGPT | 1000 | 1 | 8 | Yes | Yes | 11.08 | 5332.47 | 2606.86 |
| ShareGPT | 1000 | 4 | 8 | Yes | Yes | 10.84 | 5215.25 | 2549.56 |
| ShareGPT | 1000 | 4 | 8 | Yes | Yes | 10.84 | 5215.25 | 2549.56 |
"python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts $NUM_PROMPTS --max-loras 1 --max-lora-rank 8 --enable-lora --lora-path \"yard1/llama-2-7b-sql-lora-test\""
"python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts $NUM_PROMPTS --max-loras 4 --max-lora-rank 8 --enable-lora --lora-path \"yard1/llama-2-7b-sql-lora-test\""
"python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts $NUM_PROMPTS --async-engine"
"python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts $NUM_PROMPTS --async-engine --max-loras 1 --max-lora-rank 8 --enable-lora --lora-path \"yard1/llama-2-7b-sql-lora-test\""
"python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts $NUM_PROMPTS --async-engine --max-loras 4 --max-lora-rank 8 --enable-lora --lora-path \"yard1/llama-2-7b-sql-lora-test\""
RandomDataSet throughput test
after the change
This is changed to the random sampling defined in benchmark_serving.py
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s | Range Ratio | Prefix Len | Input Len | Output Len |
|---|---|---|---|---|---|---|---|---|
| random | 10 | 51.92 | 1277.24 | 752.85 | 0.5 | 2 | 10 | 20 |
| random | 10 | 39.35 | 1188.51 | 830.39 | 0.5 | 2 | 10 | 30 |
| random | 10 | 51.20 | 1689.73 | 834.62 | 0.5 | 2 | 20 | 20 |
| random | 10 | 36.60 | 1463.81 | 786.80 | 0.5 | 2 | 20 | 30 |
| random | 10 | 51.89 | 1660.44 | 1037.78 | 1.0 | 2 | 10 | 20 |
before the change
This is the original random sampling defined in benchmark_throughput.py. Range ratio and prefix len is not needed in the original throughput test's random sampling.
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s | Range Ratio | Prefix Len | Input Len | Output Len |
|---|---|---|---|---|---|---|---|---|
| random | 10 | 53.07 | 1592.00 | 1061.33 | 10 | 20 | ||
| random | 10 | 37.15 | 1486.08 | 1114.56 | 10 | 30 | ||
| random | 10 | 51.30 | 2052.04 | 1026.02 | 20 | 20 | ||
| random | 10 | 37.23 | 1861.38 | 1116.83 | 20 | 30 |
# parameters is defined in the table above
VLLM_USE_V1=1 python3 benchmarks/benchmark_throughput.py --model NousResearch/Hermes-3-Llama-3.1-8B --dataset-name random --num-prompts 10 --prefix-len 2 --random-range-ratio 1.0 --input-len 10 --output-len 20
scripts for generating the table above is here
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @JenZhao.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
will fix lora and re-run test later.
latest testing - checking why the sharegpt one looks different now
benchmark_serving.py main branch
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 10 | 1.54 | 5409 |
| hf-vision-arena | openai-chat | 10 | 2.60 | 7191 |
| hf | openai-chat | 10 | 1.52 | 115 |
| sonnet | vllm | 10 | 1.53 | 5409 |
| sharegpt | vllm | 10 | 6.76 | 1374 |
| random | vllm | 10 | 1.49 | 10240 |
| burstgpt | vllm | 10 | 5.93 | 11970 |
benchmark_serving.py latest change
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 10 | 1.49 | 5409 |
| hf-vision-arena | openai-chat | 10 | 2.26 | 7191 |
| hf | openai-chat | 10 | 1.26 | 115 |
| sonnet | vllm | 10 | 1.50 | 5409 |
| sharegpt | vllm | 10 | 1.14 | 1960 |
| random | vllm | 10 | 1.45 | 10240 |
| burstgpt | vllm | 10 | 5.84 | 11970 |
benchmark_throughput.py main branch
main branch only has random and sharegpt dataset, its random dataset generation is different now
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 10 | 6.83 | 7867.60 | 874.18 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 10 | 1.60 | 725.71 | 345.22 |
| sonnet | 10 | |||
| burstgpt | 10 |
benchmark_throughput.py latest change
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 10 | 6.33 | 7297.28 | 810.81 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 10 | 3.25 | 1639.56 | 1152.57 |
| sonnet | 10 | 6.91 | 4557.63 | 1035.98 |
| burstgpt | 10 | 2.06 | 2853.16 | 386.93 |
testing again
Throughput Benchmark Results, this branch
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 10 | 6.80 | 7834.99 | 870.55 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 10 | 3.44 | 883.61 | 434.41 |
| sonnet | 10 | 6.85 | 4496.04 | 1027.27 |
| burstgpt | 10 | 2.10 | 2906.03 | 394.10 |
Throughput Benchmark Results, main branch
- The main branch does not support Sonnet and BurstGPT.
- The main branch’s random dataset definition is different from that of this branch; this branch uses the random dataset definition in the serving script.
- This branch is also using serving script's sharegpt sampling. There is some very minor difference.
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 10 | 6.83 | 7867.60 | 874.18 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 10 | 1.60 | 725.71 | 345.22 |
| sonnet | 10 | |||
| burstgpt | 10 |
Serving Benchmark Results, this branch
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 10 | 1.53 | 5409 |
| hf-vision-arena | openai-chat | 10 | 2.61 | 7191 |
| hf | openai-chat | 10 | 1.53 | 115 |
| sonnet | vllm | 10 | 1.54 | 5409 |
| sharegpt | vllm | 10 | 6.53 | 1374 |
| random | vllm | 10 | 1.48 | 10240 |
| burstgpt | vllm | 10 | 5.92 | 11970 |
Serving Benchmark Results, main branch
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 10 | 1.54 | 5409 |
| hf-vision-arena | openai-chat | 10 | 2.60 | 7191 |
| hf | openai-chat | 10 | 1.52 | 115 |
| sonnet | vllm | 10 | 1.53 | 5409 |
| sharegpt | vllm | 10 | 6.76 | 1374 |
| random | vllm | 10 | 1.49 | 10240 |
| burstgpt | vllm | 10 | 5.93 | 11970 |
Lora Benchmark Results, this branch
| Dataset | Num Prompts | Max Loras | Max Lora Rank | Enable Lora | Async Engine | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|---|---|---|---|
| ShareGPT_V... | 10 | 1 | 8 | Yes | No | 1.72 | 705.30 | 327.45 |
| ShareGPT_V... | 10 | 4 | 8 | Yes | No | 1.34 | 663.50 | 230.17 |
| ShareGPT_V... | 10 | N/A | N/A | No | Yes | 1.94 | 1034.53 | 365.42 |
| ShareGPT_V... | 10 | 1 | 8 | Yes | Yes | 2.40 | 1014.93 | 379.04 |
| ShareGPT_V... | 10 | 4 | 8 | Yes | Yes | 0.99 | 633.56 | 359.78 |
Lora Benchmark Results, main branch
| Dataset | Num Prompts | Max Loras | Max Lora Rank | Enable Lora | Async Engine | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|---|---|---|---|
| ShareGPT_V... | 10 | 1 | 8 | Yes | No | 1.65 | 819.24 | 467.69 |
| ShareGPT_V... | 10 | 4 | 8 | Yes | No | 1.41 | 736.43 | 240.00 |
| ShareGPT_V... | 10 | N/A | N/A | No | Yes | 1.77 | 773.02 | 375.51 |
| ShareGPT_V... | 10 | 1 | 8 | Yes | Yes | 1.85 | 736.69 | 443.24 |
| ShareGPT_V... | 10 | 4 | 8 | Yes | Yes | 1.76 | 729.75 | 240.61 |
testing with 1000 request
Lora Benchmark Results, this branch
| Dataset | Num Prompts | Max Loras | Max Lora Rank | Enable Lora | Async Engine | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|---|---|---|---|
| ShareGPT_V... | 1000 | 1 | 8 | Yes | No | 12.05 | 5755.06 | 2816.05 |
| ShareGPT_V... | 1000 | 4 | 8 | Yes | No | 10.85 | 5351.16 | 2539.97 |
| ShareGPT_V... | 1000 | N/A | N/A | No | Yes | 16.52 | 8256.40 | 3949.34 |
| ShareGPT_V... | 1000 | 1 | 8 | Yes | Yes | 11.93 | 5582.34 | 2611.23 |
| ShareGPT_V... | 1000 | 4 | 8 | Yes | Yes | 11.21 | 5364.53 | 2653.42 |
Lora Benchmark Results, main branch
| Dataset | Num Prompts | Max Loras | Max Lora Rank | Enable Lora | Async Engine | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|---|---|---|---|
| ShareGPT_V... | 1000 | 1 | 8 | Yes | No | 11.42 | 5434.65 | 2654.52 |
| ShareGPT_V... | 1000 | 4 | 8 | Yes | No | 10.87 | 5368.56 | 2520.88 |
| ShareGPT_V... | 1000 | N/A | N/A | No | Yes | 15.34 | 7528.82 | 3769.25 |
| ShareGPT_V... | 1000 | 1 | 8 | Yes | Yes | 11.60 | 5719.71 | 2573.52 |
| ShareGPT_V... | 1000 | 4 | 8 | Yes | Yes | 10.96 | 5139.18 | 2462.06 |
testing with 1000 request
Throughput Results, this branch
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 1000 | 25.18 | 29006.31 | 3222.92 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 1000 | 39.72 | 16360.76 | 7468.34 |
| sonnet | 1000 | 50.81 | 33423.33 | 7622.20 |
| burstgpt | 1000 | 14.06 | 15629.87 | 4817.21 |
Throughput Results, main branch
| Dataset | Processed Prompts | Throughput (requests/s) | Total tokens/s | Output tokens/s |
|---|---|---|---|---|
| random | 1000 | 26.14 | 30113.14 | 3345.90 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 1000 | 37.88 | 15570.40 | 7188.78 |
| sonnet | 1000 | |||
| burstgpt | 1000 |
Serving Results, this branch
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 1000 | 30.81 | 546875 |
| hf-vision-arena | openai-chat | 500 | 63.45 | 33418 |
| hf | openai-chat | 1000 | 90.91 | 11428 |
| sonnet | vllm | 1000 | 30.43 | 546875 |
| sharegpt | vllm | 1000 | 34.47 | 217393 |
| random | vllm | 1000 | 42.87 | 1024000 |
| burstgpt | vllm | 1000 | 101.00 | 768960 |
Serving Results, main branch
| Dataset | Backend | Successful requests | Benchmark duration (s) | Total input tokens |
|---|---|---|---|---|
| sonnet | openai-chat | 1000 | 32.32 | 546875 |
| hf-vision-arena | openai-chat | 500 | 64.04 | 33418 |
| hf | openai-chat | 1000 | 478.90 | 11428 |
| sonnet | vllm | 1000 | 37.38 | 546875 |
| sharegpt | vllm | 1000 | 37.34 | 217393 |
| random | vllm | 1000 | 59.84 | 1024000 |
| burstgpt | vllm | 1000 | 109.34 | 768960 |
Thoughput Results, this branch
sharegpt does not match, will look into this later.
| Dataset | Processed Prompts | Total Prompt Tokens | Total Tokens | Total Output Tokens | Requests/s | Total Tokens/s | Output Tokens/s |
|---|---|---|---|---|---|---|---|
| random | 10 | 10240 | 11520 | 1280 | 6.74 | 7765.54 | 862.84 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 10 | 1798 | 3710 | 1912 | 2.75 | 1021.74 | 526.57 |
| sonnet | 10 | 5089 | 6589 | 1500 | 6.93 | 4563.60 | 1038.91 |
| burstgpt | 10 | 11970 | 13848 | 1878 | 2.05 | 2839.61 | 385.09 |
Throughput Results, main branch
| Dataset | Processed Prompts | Total Prompt Tokens | Total Tokens | Total Output Tokens | Requests/s | Total Tokens/s | Output Tokens/s |
|---|---|---|---|---|---|---|---|
| random | 10 | 10240 | 11520 | 1280 | 6.85 | 7896.75 | 877.42 |
| ShareGPT_V3_unfiltered_cleaned_split.json | 10 | 2474 | 3751 | 1277 | 2.89 | 1085.52 | 369.56 |
| sonnet | 10 | ||||||
| burstgpt | 10 |
Thoughput Results, this branch
sharegpt does not match, will look into this later.
Dataset Processed Prompts Total Prompt Tokens Total Tokens Total Output Tokens Requests/s Total Tokens/s Output Tokens/s random 10 10240 11520 1280 6.74 7765.54 862.84 ShareGPT_V3_unfiltered_cleaned_split.json 10 1798 3710 1912 2.75 1021.74 526.57 sonnet 10 5089 6589 1500 6.93 4563.60 1038.91 burstgpt 10 11970 13848 1878 2.05 2839.61 385.09 Throughput Results, main branch
Dataset Processed Prompts Total Prompt Tokens Total Tokens Total Output Tokens Requests/s Total Tokens/s Output Tokens/s random 10 10240 11520 1280 6.85 7896.75 877.42 ShareGPT_V3_unfiltered_cleaned_split.json 10 2474 3751 1277 2.89 1085.52 369.56 sonnet 10 burstgpt 10
We should try to find out why sampling for ShareGPT is different between main and this branch, since this is actually quite important. Also can you check for 1000 requests?
Thoughput Results, this branch sharegpt does not match, will look into this later. Dataset Processed Prompts Total Prompt Tokens Total Tokens Total Output Tokens Requests/s Total Tokens/s Output Tokens/s random 10 10240 11520 1280 6.74 7765.54 862.84 ShareGPT_V3_unfiltered_cleaned_split.json 10 1798 3710 1912 2.75 1021.74 526.57 sonnet 10 5089 6589 1500 6.93 4563.60 1038.91 burstgpt 10 11970 13848 1878 2.05 2839.61 385.09 Throughput Results, main branch Dataset Processed Prompts Total Prompt Tokens Total Tokens Total Output Tokens Requests/s Total Tokens/s Output Tokens/s random 10 10240 11520 1280 6.85 7896.75 877.42 ShareGPT_V3_unfiltered_cleaned_split.json 10 2474 3751 1277 2.89 1085.52 369.56 sonnet 10 burstgpt 10
We should try to find out why sampling for
ShareGPTis different between main and this branch, since this is actually quite important. Also can you check for 1000 requests?
ok they now match after setting the same random seed...
main
| Dataset | Processed Prompts | Total Prompt Tokens | Total Tokens | Total Output Tokens | Requests/s | Total Tokens/s | Output Tokens/s |
|---|---|---|---|---|---|---|---|
| ShareGPT_V3_unfiltered_cleaned_split.json | 1000 | 215196 | 413539 | 198343 | 48.12 | 19901.29 | 9545.12 |
this branch
| Dataset | Processed Prompts | Total Prompt Tokens | Total Tokens | Total Output Tokens | Requests/s | Total Tokens/s | Output Tokens/s |
|---|---|---|---|---|---|---|---|
| ShareGPT_V3_unfiltered_cleaned_split.json | 1000 | 215196 | 413539 | 198343 | 48.57 | 20084.61 | 9633.05 |