text-generation-inference
text-generation-inference copied to clipboard
Nightly load test results
This issue collects load test results runs on nightly builds.
🚀 Load test results are in for commit bee5616cfc2a93e721bfc06a58bebf13f85cae45
Variable length prompts
Constant length prompts
Delta to last release
| Metric | Input Type | Test Type | Avg Delta (%) |
|---|---|---|---|
| requests_ok | sharegpt_conversations | constant_vus | 0 |
| error_rate | sharegpt_conversations | constant_vus | 0 |
| end_to_end_latency | sharegpt_conversations | constant_vus | 0 |
| inter_token_latency | sharegpt_conversations | constant_vus | 0 |
| time_to_first_token | sharegpt_conversations | constant_vus | 0 |
| tokens_throughput | sharegpt_conversations | constant_vus | 0 |
| requests_ok | sharegpt_conversations | constant_arrival_rate | 0 |
| error_rate | sharegpt_conversations | constant_arrival_rate | 0 |
| end_to_end_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| inter_token_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| time_to_first_token | sharegpt_conversations | constant_arrival_rate | 0 |
| tokens_throughput | sharegpt_conversations | constant_arrival_rate | 0 |
| requests_ok | constant_tokens | constant_vus | 0 |
| error_rate | constant_tokens | constant_vus | 0 |
| end_to_end_latency | constant_tokens | constant_vus | 0 |
| inter_token_latency | constant_tokens | constant_vus | 0 |
| time_to_first_token | constant_tokens | constant_vus | 0 |
| tokens_throughput | constant_tokens | constant_vus | 0 |
| requests_ok | constant_tokens | constant_arrival_rate | 0 |
| error_rate | constant_tokens | constant_arrival_rate | 0 |
| end_to_end_latency | constant_tokens | constant_arrival_rate | 0 |
| inter_token_latency | constant_tokens | constant_arrival_rate | 0 |
| time_to_first_token | constant_tokens | constant_arrival_rate | 0 |
| tokens_throughput | constant_tokens | constant_arrival_rate | 0 |
🚀 Load test results are in for commit ceac61a0267331875b355fd439c091c77fff139f
Variable length prompts
Constant length prompts
Delta to last release
| Metric | Input Type | Test Type | Avg Delta (%) |
|---|---|---|---|
| requests_ok | sharegpt_conversations | constant_vus | 0 |
| error_rate | sharegpt_conversations | constant_vus | 0 |
| end_to_end_latency | sharegpt_conversations | constant_vus | 0 |
| inter_token_latency | sharegpt_conversations | constant_vus | 0 |
| time_to_first_token | sharegpt_conversations | constant_vus | 0 |
| tokens_throughput | sharegpt_conversations | constant_vus | 0 |
| requests_ok | sharegpt_conversations | constant_arrival_rate | 0 |
| error_rate | sharegpt_conversations | constant_arrival_rate | 0 |
| end_to_end_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| inter_token_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| time_to_first_token | sharegpt_conversations | constant_arrival_rate | 0 |
| tokens_throughput | sharegpt_conversations | constant_arrival_rate | 0 |
| requests_ok | constant_tokens | constant_vus | 0 |
| error_rate | constant_tokens | constant_vus | 0 |
| end_to_end_latency | constant_tokens | constant_vus | 0 |
| inter_token_latency | constant_tokens | constant_vus | 0 |
| time_to_first_token | constant_tokens | constant_vus | 0 |
| tokens_throughput | constant_tokens | constant_vus | 0 |
| requests_ok | constant_tokens | constant_arrival_rate | 0 |
| error_rate | constant_tokens | constant_arrival_rate | 0 |
| end_to_end_latency | constant_tokens | constant_arrival_rate | 0 |
| inter_token_latency | constant_tokens | constant_arrival_rate | 0 |
| time_to_first_token | constant_tokens | constant_arrival_rate | 0 |
| tokens_throughput | constant_tokens | constant_arrival_rate | 0 |
🚀 Load test results are in for commit 893bd3c924c341a3b7a1c53a5a6bd3379501bd2d
Variable length prompts
Constant length prompts
Delta to last release
| Metric | Input Type | Test Type | Avg Delta (%) |
|---|---|---|---|
| requests_ok | sharegpt_conversations | constant_vus | 0 |
| error_rate | sharegpt_conversations | constant_vus | 0 |
| end_to_end_latency | sharegpt_conversations | constant_vus | 0 |
| inter_token_latency | sharegpt_conversations | constant_vus | 0 |
| time_to_first_token | sharegpt_conversations | constant_vus | 0 |
| tokens_throughput | sharegpt_conversations | constant_vus | 0 |
| requests_ok | sharegpt_conversations | constant_arrival_rate | 0 |
| error_rate | sharegpt_conversations | constant_arrival_rate | 0 |
| end_to_end_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| inter_token_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| time_to_first_token | sharegpt_conversations | constant_arrival_rate | 0 |
| tokens_throughput | sharegpt_conversations | constant_arrival_rate | 0 |
| requests_ok | constant_tokens | constant_vus | 0 |
| error_rate | constant_tokens | constant_vus | 0 |
| end_to_end_latency | constant_tokens | constant_vus | 0 |
| inter_token_latency | constant_tokens | constant_vus | 0 |
| time_to_first_token | constant_tokens | constant_vus | 0 |
| tokens_throughput | constant_tokens | constant_vus | 0 |
| requests_ok | constant_tokens | constant_arrival_rate | 0 |
| error_rate | constant_tokens | constant_arrival_rate | 0 |
| end_to_end_latency | constant_tokens | constant_arrival_rate | 0 |
| inter_token_latency | constant_tokens | constant_arrival_rate | 0 |
| time_to_first_token | constant_tokens | constant_arrival_rate | 0 |
| tokens_throughput | constant_tokens | constant_arrival_rate | 0 |
🚀 Load test results are in for commit 4e7c0c63c6d2a671ff2306a84212aeb339f1e92b
Variable length prompts
Constant length prompts
Delta to last release
| Metric | Input Type | Test Type | Avg Delta (%) |
|---|---|---|---|
| requests_ok | sharegpt_conversations | constant_vus | 0 |
| error_rate | sharegpt_conversations | constant_vus | 0 |
| end_to_end_latency | sharegpt_conversations | constant_vus | 0 |
| inter_token_latency | sharegpt_conversations | constant_vus | 0 |
| time_to_first_token | sharegpt_conversations | constant_vus | 0 |
| tokens_throughput | sharegpt_conversations | constant_vus | 0 |
| requests_ok | sharegpt_conversations | constant_arrival_rate | 0 |
| error_rate | sharegpt_conversations | constant_arrival_rate | 0 |
| end_to_end_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| inter_token_latency | sharegpt_conversations | constant_arrival_rate | 0 |
| time_to_first_token | sharegpt_conversations | constant_arrival_rate | 0 |
| tokens_throughput | sharegpt_conversations | constant_arrival_rate | 0 |
| requests_ok | constant_tokens | constant_vus | 0 |
| error_rate | constant_tokens | constant_vus | 0 |
| end_to_end_latency | constant_tokens | constant_vus | 0 |
| inter_token_latency | constant_tokens | constant_vus | 0 |
| time_to_first_token | constant_tokens | constant_vus | 0 |
| tokens_throughput | constant_tokens | constant_vus | 0 |
| requests_ok | constant_tokens | constant_arrival_rate | 0 |
| error_rate | constant_tokens | constant_arrival_rate | 0 |
| end_to_end_latency | constant_tokens | constant_arrival_rate | 0 |
| inter_token_latency | constant_tokens | constant_arrival_rate | 0 |
| time_to_first_token | constant_tokens | constant_arrival_rate | 0 |
| tokens_throughput | constant_tokens | constant_arrival_rate | 0 |