Divide by zero: request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]
Running the benchmark script on a llama-3-8b-inst on inferentia 2 (djl-serving) results in:
python3.10 token_benchmark_ray.py \
--model "openai/llama3-8b-inst" \
--mean-input-tokens 550 \
--stddev-input-tokens 150 \
--mean-output-tokens 150 \
--stddev-output-tokens 10 \
--max-num-completed-requests 1 \
--timeout 600 \
--num-concurrent-requests 1 \
--results-dir "result_outputs" \
--llm-api "openai" \
--additional-sampling-params '{}'
Traceback (most recent call last):
File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 456, in <module>
run_token_benchmark(
File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 297, in run_token_benchmark
summary, individual_responses = get_token_throughput_latencies(
File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 116, in get_token_throughput_latencies
request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]
ZeroDivisionError: division by zero
same problem here, what did you do to fix this?
same problem here when testing a api deployed in house
same problem here when testing a api deployed in house
same problem here when testing a api deployed in house
same problem
(OpenAIChatCompletionsClient pid=164504) 422
(OpenAIChatCompletionsClient pid=164507) 422
(OpenAIChatCompletionsClient pid=164510) 422
(OpenAIChatCompletionsClient pid=164506) 422
(OpenAIChatCompletionsClient pid=164500) 422
(OpenAIChatCompletionsClient pid=164502) 422
(OpenAIChatCompletionsClient pid=164498) 422
Traceback (most recent call last):
File "/home/llmperf/token_benchmark_ray.py", line 462, in
same problem here, how to fix this?
same problem here, how to fix this?
try print the response code and content
https://github.com/ray-project/llmperf/blob/main/src/llmperf/ray_clients/openai_chat_completions_client.py#L23 My deployed api cannot accept empty request content, delete this line or add some content can fix the 422 error
Same problem when testing deepseek-r1 API.
In addition, using deepseek-v3 API can work well.
same problem when tesing deepseek-r1-32b
Here is the fix for this issue: https://github.com/ray-project/llmperf/pull/93