llmperf Divide by zero: request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E

Running the benchmark script on a llama-3-8b-inst on inferentia 2 (djl-serving) results in:

python3.10 token_benchmark_ray.py \                                           
--model "openai/llama3-8b-inst" \
--mean-input-tokens 550 \
--stddev-input-tokens 150 \
--mean-output-tokens 150 \
--stddev-output-tokens 10 \
--max-num-completed-requests 1 \  
--timeout 600 \
--num-concurrent-requests 1 \ 
--results-dir "result_outputs" \
--llm-api "openai" \
--additional-sampling-params '{}'

Traceback (most recent call last):
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 456, in <module>
    run_token_benchmark(
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 297, in run_token_benchmark
    summary, individual_responses = get_token_throughput_latencies(
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 116, in get_token_throughput_latencies
    request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]
ZeroDivisionError: division by zero

Jun 13 '24 09:06 yaronr

same problem here, what did you do to fix this?

Aug 05 '24 21:08 sadrafh

same problem here when testing a api deployed in house

Aug 06 '24 02:08 ericg108

same problem here when testing a api deployed in house

Aug 09 '24 03:08 changqingla

same problem here when testing a api deployed in house

Sep 06 '24 10:09 hwzhuhao

same problem (OpenAIChatCompletionsClient pid=164504) 422 (OpenAIChatCompletionsClient pid=164507) 422 (OpenAIChatCompletionsClient pid=164510) 422 (OpenAIChatCompletionsClient pid=164506) 422 (OpenAIChatCompletionsClient pid=164500) 422 (OpenAIChatCompletionsClient pid=164502) 422 (OpenAIChatCompletionsClient pid=164498) 422 Traceback (most recent call last): File "/home/llmperf/token_benchmark_ray.py", line 462, in run_token_benchmark( File "/home/llmperf/token_benchmark_ray.py", line 303, in run_token_benchmark summary, individual_responses = get_token_throughput_latencies( File "/home/llmperf/token_benchmark_ray.py", line 122, in get_token_throughput_latencies request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT] ZeroDivisionError: division by zero (OpenAIChatCompletionsClient pid=164508) 422 (OpenAIChatCompletionsClient pid=164508) Warning Or Error: 422 Client Error: Unprocessable Content for url: http://x.x.x.x:1025/v1/chat/completions

Sep 09 '24 02:09 Eviltuzki

same problem here, how to fix this?

Oct 21 '24 02:10 kylin-zhou

same problem here, how to fix this?

try print the response code and content

Oct 21 '24 02:10 Eviltuzki

https://github.com/ray-project/llmperf/blob/main/src/llmperf/ray_clients/openai_chat_completions_client.py#L23 My deployed api cannot accept empty request content, delete this line or add some content can fix the 422 error

Feb 12 '25 01:02 zhangtf0524

Same problem when testing deepseek-r1 API.

In addition, using deepseek-v3 API can work well.

Feb 20 '25 05:02 Penguin-zlh

same problem when tesing deepseek-r1-32b

Apr 12 '25 09:04 stg609

Here is the fix for this issue: https://github.com/ray-project/llmperf/pull/93

May 28 '25 12:05 kundan3034

Divide by zero: request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]