llmperf icon indicating copy to clipboard operation
llmperf copied to clipboard

Divide by zero: request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]

Open yaronr opened this issue 1 year ago • 11 comments

Running the benchmark script on a llama-3-8b-inst on inferentia 2 (djl-serving) results in:

python3.10 token_benchmark_ray.py \                                           
--model "openai/llama3-8b-inst" \
--mean-input-tokens 550 \
--stddev-input-tokens 150 \
--mean-output-tokens 150 \
--stddev-output-tokens 10 \
--max-num-completed-requests 1 \  
--timeout 600 \
--num-concurrent-requests 1 \ 
--results-dir "result_outputs" \
--llm-api "openai" \
--additional-sampling-params '{}'
Traceback (most recent call last):
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 456, in <module>
    run_token_benchmark(
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 297, in run_token_benchmark
    summary, individual_responses = get_token_throughput_latencies(
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 116, in get_token_throughput_latencies
    request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]
ZeroDivisionError: division by zero

yaronr avatar Jun 13 '24 09:06 yaronr

same problem here, what did you do to fix this?

sadrafh avatar Aug 05 '24 21:08 sadrafh

same problem here when testing a api deployed in house

ericg108 avatar Aug 06 '24 02:08 ericg108

same problem here when testing a api deployed in house

changqingla avatar Aug 09 '24 03:08 changqingla

same problem here when testing a api deployed in house

hwzhuhao avatar Sep 06 '24 10:09 hwzhuhao

same problem (OpenAIChatCompletionsClient pid=164504) 422 (OpenAIChatCompletionsClient pid=164507) 422 (OpenAIChatCompletionsClient pid=164510) 422 (OpenAIChatCompletionsClient pid=164506) 422 (OpenAIChatCompletionsClient pid=164500) 422 (OpenAIChatCompletionsClient pid=164502) 422 (OpenAIChatCompletionsClient pid=164498) 422 Traceback (most recent call last): File "/home/llmperf/token_benchmark_ray.py", line 462, in run_token_benchmark( File "/home/llmperf/token_benchmark_ray.py", line 303, in run_token_benchmark summary, individual_responses = get_token_throughput_latencies( File "/home/llmperf/token_benchmark_ray.py", line 122, in get_token_throughput_latencies request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT] ZeroDivisionError: division by zero (OpenAIChatCompletionsClient pid=164508) 422 (OpenAIChatCompletionsClient pid=164508) Warning Or Error: 422 Client Error: Unprocessable Content for url: http://x.x.x.x:1025/v1/chat/completions

Eviltuzki avatar Sep 09 '24 02:09 Eviltuzki

same problem here, how to fix this?

kylin-zhou avatar Oct 21 '24 02:10 kylin-zhou

same problem here, how to fix this?

try print the response code and content

Eviltuzki avatar Oct 21 '24 02:10 Eviltuzki

https://github.com/ray-project/llmperf/blob/main/src/llmperf/ray_clients/openai_chat_completions_client.py#L23 My deployed api cannot accept empty request content, delete this line or add some content can fix the 422 error

zhangtf0524 avatar Feb 12 '25 01:02 zhangtf0524

Same problem when testing deepseek-r1 API.

In addition, using deepseek-v3 API can work well.

Penguin-zlh avatar Feb 20 '25 05:02 Penguin-zlh

same problem when tesing deepseek-r1-32b

stg609 avatar Apr 12 '25 09:04 stg609

Here is the fix for this issue: https://github.com/ray-project/llmperf/pull/93

kundan3034 avatar May 28 '25 12:05 kundan3034