1633 ERRORS encountered when running "sh ./reference_mlperf_accuracy.sh"

Open jhsiao1948 opened this issue 5 months ago • 5 comments

================================================ SUT name : PySUT Scenario : Offline Mode : PerformanceOnly Samples per second: 0.857965 Tokens per second: 289.078 Result is : VALID Min duration satisfied : Yes Min queries satisfied : Yes Early stopping satisfied: Yes

================================================ Additional Stats

Min latency (ns) : 2785715494 Max latency (ns) : 1903340943861 Mean latency (ns) : 955159206269 50.00 percentile latency (ns) : 961910964272 90.00 percentile latency (ns) : 1712469406533 95.00 percentile latency (ns) : 1809927827424 97.00 percentile latency (ns) : 1847756820559 99.00 percentile latency (ns) : 1884148081334 99.90 percentile latency (ns) : 1902868349502

================================================ Test Parameters Used

samples_per_query : 1633 target_qps : 1 ttft_latency (ns): 100000000 tpot_latency (ns): 100000000 max_async_queries : 1 min_duration (ms): 600000 max_duration (ms): 0 min_query_count : 1 max_query_count : 0 qsl_rng_seed : 1780908523862526354 sample_index_rng_seed : 14771362308971278857 schedule_rng_seed : 18209322760996052031 accuracy_log_rng_seed : 0 accuracy_log_probability : 0 accuracy_log_sampling_target : 0 print_timestamps : 0 performance_issue_unique : 0 performance_issue_same : 0 performance_issue_same_index : 0 performance_sample_count : 1633 WARNING: sample_concatenate_permutation was set to true. Generated samples per query might be different as the one in the setting. Check the generated_samples_per_query line in the detailed log for the real samples_per_query value

No warnings encountered during test.

1633 ERRORS encountered. See detailed log.

Jul 31 '25 17:07 jhsiao1948

Initial part of mlperf_log_detail.txt:

:::MLLOG {"key": "loadgen_version", "value": "5.1.0 @ 50de99161e", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 53, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_build_date_local", "value": "2025-07-25T14:50:07.881379", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 55, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_build_date_utc", "value": "2025-07-25T14:50:07.881392", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 56, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_git_commit_date", "value": "2025-07-23T08:35:44-05:00", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 57, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_git_log_message", "value": "50de99161e33f32b569c7a00b6ccf56f274d418d Address issue that logger.info not captured by stdout; remove redundant logging (#2278)\n35d9836017ec2aea4b416f085980c81c1e90d682 Update documentation (#2279)\n9a1990e5d161144a1a3a44edb91211de78636bf6 Update download path for DeepSeek-R1 Dataset (#2275)\n7b9643c804dabb253e1fa2b811c700461ca9ed58 Fix SingleStream llama3.1-8b typo (#2274)\nfa32df9a9a4be1eab86774e260a217360a1ff64d Pinning vllm for speech-to-text reference (#2273)\nc57507b1227e1291a0535566d5988d0ab74ff376 Add interactive scenario in the TEST06, bump loadgen version to 5.1 (#2272)\n1446b3501c172153518b53871edbc1a0df014128 Update version generate_final_report.py (#2269)\n5232291860484b747ceeed7a327e56326e3eafe6 Update README.md (#2255)\n7d86e6b8b7564f99fef0c151fdeed7c67b53e392 Update download path for llama3.1_8b dataset (#2261)\nbcb600ed0301c23633906edeaa7f4367f2cc700c fix regex (#2260)\nbb0e01a3f47745ce7a5bd516c5064e6e7551076c accuracy (#2259)\n1bc3e998cb29a2ccb7635a5c74c875bf0c3b6432 Increment version to 5.0.25\ne05fda54b31c6859361f5d91660f1c11e6fa847d Add llama3.1-8b-edge as a separated benchmark (#2231)\n24767db549fb6cf0cd506113e34a2a8402ea222f update eval_accuracy.py and deepseek thresholds (#2233)\nae1320c902a4470af5eff581b9119f37665fbca3 Incorrect Regex for RougeLSum (#2230)\n748201149bdffdf1254e042d63cb21c948f8c43a Fix Docs (#2229)", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 58, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_git_status_message", "value": "", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 60, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_file_sha1", "value": {"/.clang-format":"012aad77e5206c89d50718c46c119d1f3cb056b2","/CMakeLists.txt":"a8ebd64f62d0349aeedbe3295d833ebdce625c2e","/MANIFEST.in":"ddeb472d62edf2920db1f8fa3beebe3e831557f1","/README.md":"e850133bdbbfa62c84bc05a7358114d8996e0530","/README_BUILD.md":"5f6c6a784e9cd6995db47f9b9f70b1769909c9d8","/README_FAQ.md":"01f9ae9887f50bc030dc6107e740f40c43ca388f","/VERSION.txt":"204887433f1f70007f566f5bd6bbacbb68b15a6d","/init.py":"d013101621ef06a0ddc5e7d9ce511918a8b2ebe6","/bindings/c_api.cc":"14d178b64c7fc45d090e038c08d9b78ca943c383","/bindings/c_api.h":"23d9f99e00b2d196e095fae0bb453a391c18d601","/bindings/python_api.cc":"4dae966c92acdaa373b04a95adc4ca353937f154","/diagram_network_submission.png":"53dba8ad4272190ceb6335c12fd25e53dc02a8cb","/diagram_submission.png":"84c2f79309b237cef652aef6a187ba8e875a3952","/early_stopping.cc":"0cd7b546a389deac73f7955cd39255ed76557d62","/early_stopping.h":"158fcae6a5f47e82150d6416fa1f7bcef37e77fe","/issue_query_controller.cc":"02fcfe6d9cf958eeb4b6f1f4dbe87ba7eb4d7dec","/issue_query_controller.h":"ed20934fd3507a15949d501ac154be38e766f6ab","/loadgen.cc":"6daa9cd51454a699fcb55d9aa6bf9e54dd7b7a97","/loadgen.h":"ce9fcb5d44951e7e9048a83b7c1a41c8b8e0f7d8","/loadgen_integration_diagram.svg":"47f748307536f80cfc606947b440dd732afc2637","/logging.cc":"49e63158ebca654fa4b7c5f3321054cf4d6c3a30","/logging.h":"2102c91dedbaa156beadf0cecc63d2f43a2bd7dd","/mlperf.conf":"995a5e32f4e87da6ac0848cbdd8369e4ee4f321f","/mlperf_conf.h":"1cd5c9510eb0593e2721a3f3383e2e9d8a74d7ec","/pyproject.toml":"712fab87b72ba67ef2a068d0f9f47da65130342f","/query_dispatch_library.h":"1f18e9cd3ee4dc89a387cf462de1d0ceb1ece975","/query_sample.h":"c4f399103bc3d172079bbd4cd2b0ca0f22eebc4f","/query_sample_library.h":"8323a2225be1dff31f08ecc86b76eb3de06568bc","/requirements.txt":"a5ff7e77caa6e9e22ada90f0de0c865c987bf167","/results.cc":"fa04efe1049f62262eff7973d49cb2d90a406dcd","/results.h":"fce22d5a588d91fd968a6b25c27896dba87bc276","/setup.py":"a5eaa6f713bd3dfb6603be2c7928f0c295d7ee30","/system_under_test.h":"18d4809589dae33317d88d9beeb5491a6e1ccdec","/test_settings.h":"8e05582d1fbe9dd2b809686684c3a0ac41248723","/test_settings_internal.cc":"a5cc85fb7735727eee032aa3e88b5d61c1f11a2a","/test_settings_internal.h":"2bb9e9ae53904cb0ca221f4a5d49ca7d9ec3b0ca","/utils.cc":"3df8fdabf6eaea4697cf25d1dcb89cae88e36efd","/utils.h":"40775e32d619ea6356826ae5ea4174c7911f6894","/version.cc":"cbec2a5f98f9786c8c3d8b06b3d12df0b6550fa0","/version.h":"9d574baa64424e9c708fcfedd3dbb0b518a65fcc","/version_generator.py":"9f23d13276194588473120a8a6ecf5a6ed034a23"}, "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 67, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "test_datetime", "value": "2025-07-29T19:13:06Z", "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1194, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "sut_name", "value": "PySUT", "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1195, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "get_sut_name_duration_ns", "value": 364, "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1196, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "qsl_name", "value": "PyQSL", "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1197, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "qsl_reported_total_count", "value": 1633, "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1198, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "qsl_reported_performance_count", "value": 1633, "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1199, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_scenario", "value": "Offline", "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 272, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_test_mode", "value": "PerformanceOnly", "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 273, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_offline_expected_qps", "value": 1, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 310, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_min_duration_ms", "value": 600000, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 316, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_max_duration_ms", "value": 0, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 317, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_min_query_count", "value": 1633, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 318, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_max_query_count", "value": 0, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 319, "pid": 147525, "tid": 147525}} @@@

*** If need more please let me know. Thanks!

Aug 01 '25 12:08 jhsiao1948

Initial part of the console log:

(mlperf) [email protected]:/work/build/inference/speech2text$ head -f -n 500 mlper_loadgen_response.log head: invalid option -- 'f' Try 'head --help' for more information. (mlperf) [email protected]:/work/build/inference/speech2text$ head -h head: invalid option -- 'h' Try 'head --help' for more information. (mlperf) [email protected]:/work/build/inference/speech2text$ head --help Usage: head [OPTION]... [FILE]... Print the first 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name.

With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too. -c, --bytes=[-]NUM print the first NUM bytes of each file; with the leading '-', print all but the last NUM bytes of each file -n, --lines=[-]NUM print the first NUM lines instead of the first 10; with the leading '-', print all but the last NUM lines of each file -q, --quiet, --silent never print headers giving file names -v, --verbose always print headers giving file names -z, --zero-terminated line delimiter is NUL, not newline --help display this help and exit --version output version information and exit

NUM may have a multiplier suffix: b 512, kB 1000, K 1024, MB 10001000, M 10241024, GB 100010001000, G 102410241024, and so on for T, P, E, Z, Y, R, Q. Binary prefixes can be used, too: KiB=K, MiB=M, and so on.

GNU coreutils online help: https://www.gnu.org/software/coreutils/ Report any translation bugs to https://translationproject.org/team/ Full documentation https://www.gnu.org/software/coreutils/head or available locally via: info '(coreutils) head invocation' (mlperf) [email protected]:/work/build/inference/speech2text$ head -n 500 mlper_loadgen_response.log Time Start: 1753816338 CORES_PER_INST: 32 NUM_INSTS: 2 START_CORES: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127 INFO 07-29 19:12:22 [init.py:244] Automatically detected platform cuda. Namespace(scenario='Offline', accuracy=False, mlperf_conf='mlperf.conf', user_conf='user.conf', audit_conf='audit.conf', dataset_dir='/work', model_path='openai/whisper-large-v3', manifest='/work//data/dev-all-repack.json', perf_count=None, log_dir='/work/run_output', num_workers=2) Dataset loaded with 10.91 hours. Filtered 0.00 hours. Number of samples: 1633 Binding rank 0 to nodes (0,) Binding rank 0 to cores (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31) Dataset loaded with 10.91 hours. Filtered 0.00 hours. Number of samples: 1633 Binding rank 1 to nodes (1,) Binding rank 1 to cores (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33) Dataset loaded with 10.91 hours. Filtered 0.00 hours. Number of samples: 1633 pool size 8 Precision: bfloat16 Worker 1: Setting CUDA_VISIBLE_DEVICES=1 INFO 07-29 19:12:39 [config.py:841] This model supports multiple tasks: {'reward', 'generate', 'transcription', 'classify', 'embed'}. Defaulting to 'transcription'. WARNING 07-29 19:12:39 [config.py:3371] Casting torch.float16 to torch.bfloat16. INFO 07-29 19:12:39 [config.py:1472] Using max model len 448 WARNING 07-29 19:12:40 [arg_utils.py:1735] ['WhisperForConditionalGeneration'] is not supported by the V1 Engine. Falling back to V0. INFO 07-29 19:12:40 [llm_engine.py:230] Initializing a V0 LLM engine (v0.9.2) with config: model='openai/whisper-large-v3', speculative_config=None, tokenizer='openai/whisper-large-v3', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=448, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-large-v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":64,"local_cache_dir":null}, use_cached_outputs=False, pool size 8 Precision: bfloat16 Worker 0: Setting CUDA_VISIBLE_DEVICES=0 INFO 07-29 19:12:40 [config.py:841] This model supports multiple tasks: {'reward', 'generate', 'transcription', 'classify', 'embed'}. Defaulting to 'transcription'. WARNING 07-29 19:12:40 [config.py:3371] Casting torch.float16 to torch.bfloat16. INFO 07-29 19:12:40 [config.py:1472] Using max model len 448 WARNING 07-29 19:12:40 [arg_utils.py:1735] ['WhisperForConditionalGeneration'] is not supported by the V1 Engine. Falling back to V0. INFO 07-29 19:12:40 [cuda.py:363] Using Flash Attention backend. INFO 07-29 19:12:41 [llm_engine.py:230] Initializing a V0 LLM engine (v0.9.2) with config: model='openai/whisper-large-v3', speculative_config=None, tokenizer='openai/whisper-large-v3', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=448, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-large-v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":64,"local_cache_dir":null}, use_cached_outputs=False, INFO 07-29 19:12:42 [cuda.py:363] Using Flash Attention backend. INFO 07-29 19:12:42 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 07-29 19:12:42 [model_runner.py:1171] Starting to load model openai/whisper-large-v3... INFO 07-29 19:12:42 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 07-29 19:12:42 [model_runner.py:1171] Starting to load model openai/whisper-large-v3... INFO 07-29 19:12:43 [weight_utils.py:292] Using model weights format ['.safetensors'] INFO 07-29 19:12:43 [weight_utils.py:345] No model.safetensors.index.json found in remote. Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:00, 8.22it/s] INFO 07-29 19:12:43 [weight_utils.py:292] Using model weights format ['.safetensors'] INFO 07-29 19:12:43 [weight_utils.py:345] No model.safetensors.index.json found in remote. Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:01<00:00, 1.43it/s] Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:01<00:03, 1.60s/it] Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:07<00:04, 4.07s/it] Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 3.56s/it] Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 2.73s/it]

INFO 07-29 19:12:51 [default_loader.py:272] Loading weights took 8.29 seconds Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 2.61s/it] Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 2.76s/it]

INFO 07-29 19:12:52 [default_loader.py:272] Loading weights took 8.37 seconds INFO 07-29 19:12:52 [model_runner.py:1203] Model loading took 2.8764 GiB and 9.005007 seconds INFO 07-29 19:12:52 [model_runner.py:1203] Model loading took 2.8764 GiB and 9.121790 seconds INFO 07-29 19:12:53 [enc_dec_model_runner.py:315] Starting profile run for multi-modal models. WARNING 07-29 19:12:53 [registry.py:183] WhisperProcessor did not return BatchFeature. Make sure to match the behaviour of ProcessorMixin when implementing custom processors. INFO 07-29 19:12:53 [enc_dec_model_runner.py:315] Starting profile run for multi-modal models. WARNING 07-29 19:12:53 [registry.py:183] WhisperProcessor did not return BatchFeature. Make sure to match the behaviour of ProcessorMixin when implementing custom processors. INFO 07-29 19:12:55 [worker.py:294] Memory profiling takes 3.01 seconds INFO 07-29 19:12:55 [worker.py:294] the current vLLM instance can use total_gpu_memory (39.49GiB) x gpu_memory_utilization (0.80) = 31.60GiB INFO 07-29 19:12:55 [worker.py:294] model weights take 2.88GiB; non_torch_memory takes 0.09GiB; PyTorch activation peak memory takes 2.62GiB; the rest of the memory reserved for KV Cache is 26.00GiB. INFO 07-29 19:12:55 [executor_base.py:113] # cuda blocks: 10650, # CPU blocks: 1638 INFO 07-29 19:12:55 [executor_base.py:118] Maximum concurrency for 448 tokens per request: 380.36x INFO 07-29 19:12:55 [worker.py:294] Memory profiling takes 2.91 seconds INFO 07-29 19:12:55 [worker.py:294] the current vLLM instance can use total_gpu_memory (39.49GiB) x gpu_memory_utilization (0.80) = 31.60GiB INFO 07-29 19:12:55 [worker.py:294] model weights take 2.88GiB; non_torch_memory takes 0.09GiB; PyTorch activation peak memory takes 2.62GiB; the rest of the memory reserved for KV Cache is 26.00GiB. INFO 07-29 19:12:56 [executor_base.py:113] # cuda blocks: 10650, # CPU blocks: 1638 INFO 07-29 19:12:56 [executor_base.py:118] Maximum concurrency for 448 tokens per request: 380.36x INFO 07-29 19:12:57 [model_runner.py:1513] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. Capturing CUDA graph shapes: 0%| | 0/11 [00:00<?, ?it/s]INFO 07-29 19:12:57 [model_runner.py:1513] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. Capturing CUDA graph shapes: 100%|██████████| 11/11 [00:06<00:00, 1.63it/s] INFO 07-29 19:13:04 [model_runner.py:1671] Graph capturing finished in 7 secs, took 0.12 GiB Capturing CUDA graph shapes: 73%|███████▎ | 8/11 [00:06<00:02, 1.13it/s]INFO 07-29 19:13:04 [llm_engine.py:428] init engine (profile, create kv cache, warmup model) took 11.96 seconds Capturing CUDA graph shapes: 100%|██████████| 11/11 [00:08<00:00, 1.22it/s] INFO 07-29 19:13:06 [model_runner.py:1671] Graph capturing finished in 9 secs, took 0.12 GiB INFO 07-29 19:13:06 [llm_engine.py:428] init engine (profile, create kv cache, warmup model) took 14.37 seconds INFO:SUT:Starting Loadgen response thread Adding requests: 100%|██████████| 1/1 [00:01<00:00, 1.71s/it] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.07s/it, est. speed input: 3.75 toks/s, output: 62.83 toks/s] Sample number: 0 | Step time 2.778s Finished 224717888 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 165.12it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.35 toks/s, output: 64.43 toks/s] Sample number: 1 | Step time 1.202s Finished 224717920 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.78it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.27s/it, est. speed input: 3.16 toks/s, output: 64.82 toks/s] Sample number: 2 | Step time 1.271s Finished 224717952 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 213.91it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.08s/it, est. speed input: 3.69 toks/s, output: 64.65 toks/s] Sample number: 3 | Step time 1.088s Finished 224717984 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.63it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.21s/it, est. speed input: 3.32 toks/s, output: 64.65 toks/s] Sample number: 4 | Step time 1.213s Finished 224718016 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.80it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.16 toks/s, output: 64.85 toks/s] Sample number: 5 | Step time 1.270s Finished 224718048 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.63it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.18s/it, est. speed input: 3.40 toks/s, output: 64.65 toks/s] Sample number: 6 | Step time 1.181s Finished 224718080 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 215.81it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.38it/s, est. speed input: 5.51 toks/s, output: 63.37 toks/s] Sample number: 7 | Step time 0.731s Finished 224718112 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 210.03it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.16it/s, est. speed input: 4.64 toks/s, output: 63.81 toks/s] Sample number: 8 | Step time 0.867s Finished 224718144 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 261.83it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 2.05it/s, est. speed input: 8.19 toks/s, output: 61.41 toks/s] Sample number: 9 | Step time 0.493s Finished 224718176 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 216.55it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.37it/s, est. speed input: 5.49 toks/s, output: 63.16 toks/s] Sample number: 10 | Step time 0.734s Finished 224718208 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 183.75it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.16 toks/s, output: 64.85 toks/s] Sample number: 11 | Step time 1.271s Finished 224718240 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 233.55it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.57it/s, est. speed input: 6.27 toks/s, output: 62.71 toks/s] Sample number: 12 | Step time 0.643s Finished 224718272 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.45it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.06s/it, est. speed input: 3.78 toks/s, output: 64.25 toks/s] Sample number: 13 | Step time 1.064s Finished 224718304 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.40it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.13s/it, est. speed input: 3.53 toks/s, output: 64.41 toks/s] Sample number: 14 | Step time 1.139s Finished 224718336 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 222.21it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.00s/it, est. speed input: 4.00 toks/s, output: 64.00 toks/s] Sample number: 15 | Step time 1.005s Finished 224718368 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.00it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.24s/it, est. speed input: 3.22 toks/s, output: 64.32 toks/s] Sample number: 16 | Step time 1.249s Finished 224718400 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 190.75it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.08s/it, est. speed input: 3.71 toks/s, output: 64.06 toks/s] Sample number: 17 | Step time 1.083s Finished 224718432 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 215.14it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.29it/s, est. speed input: 5.15 toks/s, output: 63.12 toks/s] Sample number: 18 | Step time 0.782s Finished 224718464 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.96it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.48s/it, est. speed input: 2.70 toks/s, output: 64.73 toks/s] Sample number: 19 | Step time 1.489s Finished 224718496 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 219.08it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.22it/s, est. speed input: 4.86 toks/s, output: 63.21 toks/s] Sample number: 20 | Step time 0.828s Finished 224718528 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 182.54it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.10 toks/s, output: 64.39 toks/s] Sample number: 21 | Step time 1.295s Finished 224718560 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.83it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.32s/it, est. speed input: 3.03 toks/s, output: 64.44 toks/s] Sample number: 22 | Step time 1.325s Finished 224718592 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 227.31it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.59it/s, est. speed input: 6.36 toks/s, output: 62.04 toks/s] Sample number: 23 | Step time 0.634s Finished 224718624 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 184.95it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.36s/it, est. speed input: 2.94 toks/s, output: 64.60 toks/s] Sample number: 24 | Step time 1.368s Finished 224718656 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.86it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.49 toks/s] Sample number: 25 | Step time 1.293s Finished 224718688 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.71it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.24s/it, est. speed input: 3.22 toks/s, output: 64.44 toks/s] Sample number: 26 | Step time 1.247s Finished 224718720 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.27it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.23s/it, est. speed input: 3.26 toks/s, output: 64.34 toks/s] Sample number: 27 | Step time 1.234s Finished 224718752 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.26it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.44s/it, est. speed input: 2.79 toks/s, output: 64.75 toks/s] Sample number: 28 | Step time 1.442s Finished 224718784 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.22it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.19 toks/s, output: 64.55 toks/s] Sample number: 29 | Step time 1.261s Finished 224718816 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.25it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.14s/it, est. speed input: 3.52 toks/s, output: 64.30 toks/s] Sample number: 30 | Step time 1.141s Finished 224718848 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.21it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.15s/it, est. speed input: 3.48 toks/s, output: 64.32 toks/s] Sample number: 31 | Step time 1.156s Finished 224718880 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.69it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.35s/it, est. speed input: 2.97 toks/s, output: 64.59 toks/s] Sample number: 32 | Step time 1.353s Finished 224718912 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.83it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.35 toks/s, output: 64.42 toks/s] Sample number: 33 | Step time 1.201s Finished 224718944 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.28it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.41s/it, est. speed input: 2.84 toks/s, output: 64.71 toks/s] Sample number: 34 | Step time 1.412s Finished 224718976 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.16it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.44s/it, est. speed input: 2.78 toks/s, output: 64.70 toks/s] Sample number: 35 | Step time 1.443s Finished 224719008 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.84it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.63s/it, est. speed input: 2.45 toks/s, output: 64.95 toks/s] Sample number: 36 | Step time 1.638s Finished 224719040 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 209.39it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.09s/it, est. speed input: 3.67 toks/s, output: 64.25 toks/s] Sample number: 37 | Step time 1.095s Finished 224719072 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.71it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.14it/s, est. speed input: 4.55 toks/s, output: 63.63 toks/s] Sample number: 38 | Step time 0.886s Finished 224719104 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 251.61it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 2.59it/s, est. speed input: 10.37 toks/s, output: 59.63 toks/s] Sample number: 39 | Step time 0.390s Finished 224719136 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 180.27it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.33s/it, est. speed input: 3.01 toks/s, output: 64.66 toks/s] Sample number: 40 | Step time 1.336s Finished 224719168 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 197.82it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.47s/it, est. speed input: 2.73 toks/s, output: 64.80 toks/s] Sample number: 41 | Step time 1.472s Finished 224719200 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 216.64it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.31it/s, est. speed input: 5.25 toks/s, output: 62.95 toks/s] Sample number: 42 | Step time 0.768s Finished 224719232 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.48it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.45s/it, est. speed input: 2.76 toks/s, output: 64.86 toks/s] Sample number: 43 | Step time 1.455s Finished 224719264 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 244.37it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.72it/s, est. speed input: 6.90 toks/s, output: 62.09 toks/s] Sample number: 44 | Step time 0.585s Finished 224719296 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.24it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.34 toks/s, output: 64.33 toks/s] Sample number: 45 | Step time 1.203s Finished 224719328 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 242.54it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.87it/s, est. speed input: 7.46 toks/s, output: 61.57 toks/s] Sample number: 46 | Step time 0.541s Finished 224719360 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 181.81it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.39s/it, est. speed input: 2.88 toks/s, output: 64.77 toks/s] Sample number: 47 | Step time 1.396s Finished 224719392 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.01it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.58s/it, est. speed input: 2.54 toks/s, output: 64.74 toks/s] Sample number: 48 | Step time 1.582s Finished 224719424 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 209.35it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.06s/it, est. speed input: 3.78 toks/s, output: 64.18 toks/s] Sample number: 49 | Step time 1.065s Finished 224719456 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.17it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.15s/it, est. speed input: 3.48 toks/s, output: 64.30 toks/s] Sample number: 50 | Step time 1.157s Finished 224719488 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 205.24it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.20it/s, est. speed input: 4.79 toks/s, output: 63.49 toks/s] Sample number: 51 | Step time 0.840s Finished 224719520 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.82it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.70s/it, est. speed input: 2.35 toks/s, output: 65.11 toks/s] Sample number: 52 | Step time 1.711s Finished 224719552 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.31it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.35s/it, est. speed input: 2.97 toks/s, output: 64.66 toks/s] Sample number: 53 | Step time 1.351s Finished 224719584 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.24it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.28s/it, est. speed input: 3.11 toks/s, output: 64.63 toks/s] Sample number: 54 | Step time 1.290s Finished 224719616 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.37it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.20it/s, est. speed input: 4.79 toks/s, output: 63.41 toks/s] Sample number: 55 | Step time 0.842s Finished 224719648 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.79it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.24s/it, est. speed input: 3.22 toks/s, output: 64.47 toks/s] Sample number: 56 | Step time 1.247s Finished 224719680 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.98it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.13s/it, est. speed input: 3.53 toks/s, output: 64.34 toks/s] Sample number: 57 | Step time 1.141s Finished 224719712 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 224.61it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.06it/s, est. speed input: 4.26 toks/s, output: 63.83 toks/s] Sample number: 58 | Step time 0.945s Finished 224719744 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.56it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.00it/s, est. speed input: 4.00 toks/s, output: 64.01 toks/s] Sample number: 59 | Step time 1.006s Finished 224719776 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 205.65it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.18it/s, est. speed input: 4.70 toks/s, output: 63.48 toks/s] Sample number: 60 | Step time 0.856s Finished 224719808 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 184.71it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.56s/it, est. speed input: 2.57 toks/s, output: 64.85 toks/s] Sample number: 61 | Step time 1.563s Finished 224719840 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 198.17it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.47s/it, est. speed input: 2.73 toks/s, output: 64.78 toks/s] Sample number: 62 | Step time 1.472s Finished 224719872 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.62it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.13s/it, est. speed input: 3.53 toks/s, output: 64.33 toks/s] Sample number: 63 | Step time 1.140s Finished 224719904 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 201.72it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.01it/s, est. speed input: 4.06 toks/s, output: 63.90 toks/s] Sample number: 64 | Step time 0.992s Finished 224719936 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 185.35it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.44s/it, est. speed input: 2.79 toks/s, output: 64.75 toks/s] Sample number: 65 | Step time 1.442s Finished 224719968 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 190.16it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.59s/it, est. speed input: 2.52 toks/s, output: 64.94 toks/s] Sample number: 66 | Step time 1.592s Finished 224720000 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.19it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.35 toks/s, output: 64.41 toks/s] Sample number: 67 | Step time 1.201s Finished 224720032 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.58it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.27s/it, est. speed input: 3.15 toks/s, output: 64.56 toks/s] Sample number: 68 | Step time 1.276s Finished 224720064 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 214.59it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.19it/s, est. speed input: 4.78 toks/s, output: 63.33 toks/s] Sample number: 69 | Step time 0.842s Finished 224720096 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.73it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.02s/it, est. speed input: 3.94 toks/s, output: 64.00 toks/s] Sample number: 70 | Step time 1.021s Finished 224720128 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.17it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.32s/it, est. speed input: 3.04 toks/s, output: 64.59 toks/s] Sample number: 71 | Step time 1.322s Finished 224720160 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.34it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.52s/it, est. speed input: 2.62 toks/s, output: 64.94 toks/s] Sample number: 72 | Step time 1.530s Finished 224720192 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 213.15it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.16it/s, est. speed input: 4.63 toks/s, output: 63.66 toks/s] Sample number: 73 | Step time 0.869s Finished 224720224 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.16it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.09s/it, est. speed input: 3.67 toks/s, output: 64.14 toks/s] Sample number: 74 | Step time 1.097s Finished 224720256 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 197.65it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.11s/it, est. speed input: 3.62 toks/s, output: 64.20 toks/s] Sample number: 75 | Step time 1.112s Finished 224720288 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.91it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.56s/it, est. speed input: 2.57 toks/s, output: 64.95 toks/s] Sample number: 76 | Step time 1.561s Finished 224720320 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.83it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.28s/it, est. speed input: 3.12 toks/s, output: 64.64 toks/s] Sample number: 77 | Step time 1.290s Finished 224720352 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 206.25it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.18s/it, est. speed input: 3.39 toks/s, output: 64.41 toks/s] Sample number: 78 | Step time 1.185s Finished 224720384 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 190.01it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.56 toks/s] Sample number: 79 | Step time 1.292s Finished 224720416 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.13it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.12s/it, est. speed input: 3.57 toks/s, output: 64.32 toks/s] Sample number: 80 | Step time 1.125s Finished 224720448 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.42it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.76s/it, est. speed input: 2.27 toks/s, output: 65.19 toks/s] Sample number: 81 | Step time 1.770s Finished 224720480 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.57it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.06s/it, est. speed input: 3.78 toks/s, output: 64.20 toks/s] Sample number: 82 | Step time 1.065s Finished 224720512 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.22it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.50 toks/s] Sample number: 83 | Step time 1.293s Finished 224720544 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.00it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.18 toks/s, output: 64.49 toks/s] Sample number: 84 | Step time 1.262s Finished 224720576 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.91it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.19 toks/s, output: 64.51 toks/s] Sample number: 85 | Step time 1.261s Finished 224720608 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.69it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.01s/it, est. speed input: 3.94 toks/s, output: 64.09 toks/s] Sample number: 86 | Step time 1.020s Finished 224720640 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.11it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.58 toks/s] Sample number: 87 | Step time 1.291s Finished 224720672 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.13it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.62s/it, est. speed input: 2.48 toks/s, output: 64.97 toks/s] Sample number: 88 | Step time 1.622s Finished 224720704 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.13it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.32s/it, est. speed input: 3.04 toks/s, output: 64.59 toks/s] Sample number: 89 | Step time 1.322s Finished 224720736 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.79it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.36s/it, est. speed input: 2.94 toks/s, output: 64.66 toks/s] Sample number: 90 | Step time 1.367s Finished 224720768 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.90it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.22s/it, est. speed input: 3.27 toks/s, output: 64.53 toks/s] Sample number: 91 | Step time 1.230s Finished 224720800 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.21it/s] Processed prompts: 100%|██████████| 1/1 [00:02<00:00, 2.22s/it, est. speed input: 1.80 toks/s, output: 65.41 toks/s] Sample number: 92 | Step time 2.223s Finished 224720832 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.45it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.45s/it, est. speed input: 2.75 toks/s, output: 64.70 toks/s] Sample number: 93 | Step time 1.459s Finished 224720864 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.28it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.02s/it, est. speed input: 3.94 toks/s, output: 64.03 toks/s] Sample number: 94 | Step time 1.021s Finished 224720896 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.20it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.33s/it, est. speed input: 3.01 toks/s, output: 64.64 toks/s] Sample number: 95 | Step time 1.336s Finished 224720928 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 206.63it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.01it/s, est. speed input: 4.06 toks/s, output: 63.88 toks/s] Sample number: 96 | Step time 0.992s Finished 224720960 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.78it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.34it/s, est. speed input: 5.37 toks/s, output: 63.08 toks/s] Sample number: 97 | Step time 0.751s Finished 224720992 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 227.48it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.22it/s, est. speed input: 4.87 toks/s, output: 63.35 toks/s] Sample number: 98 | Step time 0.826s Finished 224721024 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 182.99it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.21s/it, est. speed input: 3.31 toks/s, output: 64.48 toks/s] Sample number: 99 | Step time 1.216s Finished 224721056 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.28it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.57s/it, est. speed input: 2.55 toks/s, output: 64.97 toks/s] Sample number: 100 | Step time 1.576s Finished 224721088 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 232.47it/s] Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.64it/s, est. speed input: 6.55 toks/s, output: 62.19 toks/s] Sample number: 101 | Step time 0.616s Finished 224721120 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 184.43it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.39s/it, est. speed input: 2.87 toks/s, output: 64.65 toks/s] Sample number: 102 | Step time 1.398s Finished 224721152 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.65it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.53s/it, est. speed input: 2.62 toks/s, output: 64.81 toks/s] Sample number: 103 | Step time 1.534s Finished 224721184 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.69it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.18 toks/s, output: 64.48 toks/s] Sample number: 104 | Step time 1.262s Finished 224721216 Adding requests: 100%|██████████| 1/1 [00:00<00:00, 204.09it/s] Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.27s/it, est. speed input: 3.15 toks/s, output: 64.52 toks/s] (mlperf) [email protected]:/work/build/inference/speech2text$

Aug 01 '25 14:08 jhsiao1948

I believe this PR would have fixed this issue.

Aug 05 '25 22:08 arjunsuresh

Hi Arjun,

Yes, that fixed the issue. Thanks!

Two questions about running "sh ./reference_mlperf_accuracy.sh". The test bed has two A100 gpus. First of all, most of the time only gpu-0 got utilized, and it ran at low utilization, about 34%. Moreover, gpu-1 never got utilized. So, are there ways to fix these two issues ? If you want me to open another ticket, please let me know.

Thanks! Jean

Aug 06 '25 14:08 jhsiao1948

@jhsiao1948 Reference implementations are typically not optimised. This implies low utilisation of resources. Submitters usually create optimised implementations for their submissions. You can expect to find well optimised implementations when the v5.1 results are published in mid September. Having said that, A100 GPUs are pretty old by now, so your mileage may vary.

Aug 08 '25 01:08 psyhtest