1633 ERRORS encountered when running "sh ./reference_mlperf_accuracy.sh"
================================================ SUT name : PySUT Scenario : Offline Mode : PerformanceOnly Samples per second: 0.857965 Tokens per second: 289.078 Result is : VALID Min duration satisfied : Yes Min queries satisfied : Yes Early stopping satisfied: Yes
================================================ Additional Stats
Min latency (ns) : 2785715494 Max latency (ns) : 1903340943861 Mean latency (ns) : 955159206269 50.00 percentile latency (ns) : 961910964272 90.00 percentile latency (ns) : 1712469406533 95.00 percentile latency (ns) : 1809927827424 97.00 percentile latency (ns) : 1847756820559 99.00 percentile latency (ns) : 1884148081334 99.90 percentile latency (ns) : 1902868349502
================================================ Test Parameters Used
samples_per_query : 1633 target_qps : 1 ttft_latency (ns): 100000000 tpot_latency (ns): 100000000 max_async_queries : 1 min_duration (ms): 600000 max_duration (ms): 0 min_query_count : 1 max_query_count : 0 qsl_rng_seed : 1780908523862526354 sample_index_rng_seed : 14771362308971278857 schedule_rng_seed : 18209322760996052031 accuracy_log_rng_seed : 0 accuracy_log_probability : 0 accuracy_log_sampling_target : 0 print_timestamps : 0 performance_issue_unique : 0 performance_issue_same : 0 performance_issue_same_index : 0 performance_sample_count : 1633 WARNING: sample_concatenate_permutation was set to true. Generated samples per query might be different as the one in the setting. Check the generated_samples_per_query line in the detailed log for the real samples_per_query value
No warnings encountered during test.
1633 ERRORS encountered. See detailed log.
Initial part of mlperf_log_detail.txt:
:::MLLOG {"key": "loadgen_version", "value": "5.1.0 @ 50de99161e", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 53, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_build_date_local", "value": "2025-07-25T14:50:07.881379", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 55, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_build_date_utc", "value": "2025-07-25T14:50:07.881392", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 56, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_git_commit_date", "value": "2025-07-23T08:35:44-05:00", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 57, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_git_log_message", "value": "50de99161e33f32b569c7a00b6ccf56f274d418d Address issue that logger.info not captured by stdout; remove redundant logging (#2278)\n35d9836017ec2aea4b416f085980c81c1e90d682 Update documentation (#2279)\n9a1990e5d161144a1a3a44edb91211de78636bf6 Update download path for DeepSeek-R1 Dataset (#2275)\n7b9643c804dabb253e1fa2b811c700461ca9ed58 Fix SingleStream llama3.1-8b typo (#2274)\nfa32df9a9a4be1eab86774e260a217360a1ff64d Pinning vllm for speech-to-text reference (#2273)\nc57507b1227e1291a0535566d5988d0ab74ff376 Add interactive scenario in the TEST06, bump loadgen version to 5.1 (#2272)\n1446b3501c172153518b53871edbc1a0df014128 Update version generate_final_report.py (#2269)\n5232291860484b747ceeed7a327e56326e3eafe6 Update README.md (#2255)\n7d86e6b8b7564f99fef0c151fdeed7c67b53e392 Update download path for llama3.1_8b dataset (#2261)\nbcb600ed0301c23633906edeaa7f4367f2cc700c fix regex (#2260)\nbb0e01a3f47745ce7a5bd516c5064e6e7551076c accuracy (#2259)\n1bc3e998cb29a2ccb7635a5c74c875bf0c3b6432 Increment version to 5.0.25\ne05fda54b31c6859361f5d91660f1c11e6fa847d Add llama3.1-8b-edge as a separated benchmark (#2231)\n24767db549fb6cf0cd506113e34a2a8402ea222f update eval_accuracy.py and deepseek thresholds (#2233)\nae1320c902a4470af5eff581b9119f37665fbca3 Incorrect Regex for RougeLSum (#2230)\n748201149bdffdf1254e042d63cb21c948f8c43a Fix Docs (#2229)", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 58, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_git_status_message", "value": "", "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 60, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "loadgen_file_sha1", "value": {"/.clang-format":"012aad77e5206c89d50718c46c119d1f3cb056b2","/CMakeLists.txt":"a8ebd64f62d0349aeedbe3295d833ebdce625c2e","/MANIFEST.in":"ddeb472d62edf2920db1f8fa3beebe3e831557f1","/README.md":"e850133bdbbfa62c84bc05a7358114d8996e0530","/README_BUILD.md":"5f6c6a784e9cd6995db47f9b9f70b1769909c9d8","/README_FAQ.md":"01f9ae9887f50bc030dc6107e740f40c43ca388f","/VERSION.txt":"204887433f1f70007f566f5bd6bbacbb68b15a6d","/init.py":"d013101621ef06a0ddc5e7d9ce511918a8b2ebe6","/bindings/c_api.cc":"14d178b64c7fc45d090e038c08d9b78ca943c383","/bindings/c_api.h":"23d9f99e00b2d196e095fae0bb453a391c18d601","/bindings/python_api.cc":"4dae966c92acdaa373b04a95adc4ca353937f154","/diagram_network_submission.png":"53dba8ad4272190ceb6335c12fd25e53dc02a8cb","/diagram_submission.png":"84c2f79309b237cef652aef6a187ba8e875a3952","/early_stopping.cc":"0cd7b546a389deac73f7955cd39255ed76557d62","/early_stopping.h":"158fcae6a5f47e82150d6416fa1f7bcef37e77fe","/issue_query_controller.cc":"02fcfe6d9cf958eeb4b6f1f4dbe87ba7eb4d7dec","/issue_query_controller.h":"ed20934fd3507a15949d501ac154be38e766f6ab","/loadgen.cc":"6daa9cd51454a699fcb55d9aa6bf9e54dd7b7a97","/loadgen.h":"ce9fcb5d44951e7e9048a83b7c1a41c8b8e0f7d8","/loadgen_integration_diagram.svg":"47f748307536f80cfc606947b440dd732afc2637","/logging.cc":"49e63158ebca654fa4b7c5f3321054cf4d6c3a30","/logging.h":"2102c91dedbaa156beadf0cecc63d2f43a2bd7dd","/mlperf.conf":"995a5e32f4e87da6ac0848cbdd8369e4ee4f321f","/mlperf_conf.h":"1cd5c9510eb0593e2721a3f3383e2e9d8a74d7ec","/pyproject.toml":"712fab87b72ba67ef2a068d0f9f47da65130342f","/query_dispatch_library.h":"1f18e9cd3ee4dc89a387cf462de1d0ceb1ece975","/query_sample.h":"c4f399103bc3d172079bbd4cd2b0ca0f22eebc4f","/query_sample_library.h":"8323a2225be1dff31f08ecc86b76eb3de06568bc","/requirements.txt":"a5ff7e77caa6e9e22ada90f0de0c865c987bf167","/results.cc":"fa04efe1049f62262eff7973d49cb2d90a406dcd","/results.h":"fce22d5a588d91fd968a6b25c27896dba87bc276","/setup.py":"a5eaa6f713bd3dfb6603be2c7928f0c295d7ee30","/system_under_test.h":"18d4809589dae33317d88d9beeb5491a6e1ccdec","/test_settings.h":"8e05582d1fbe9dd2b809686684c3a0ac41248723","/test_settings_internal.cc":"a5cc85fb7735727eee032aa3e88b5d61c1f11a2a","/test_settings_internal.h":"2bb9e9ae53904cb0ca221f4a5d49ca7d9ec3b0ca","/utils.cc":"3df8fdabf6eaea4697cf25d1dcb89cae88e36efd","/utils.h":"40775e32d619ea6356826ae5ea4174c7911f6894","/version.cc":"cbec2a5f98f9786c8c3d8b06b3d12df0b6550fa0","/version.h":"9d574baa64424e9c708fcfedd3dbb0b518a65fcc","/version_generator.py":"9f23d13276194588473120a8a6ecf5a6ed034a23"}, "time_ms": 0.006612, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "version.cc", "line_no": 67, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "test_datetime", "value": "2025-07-29T19:13:06Z", "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1194, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "sut_name", "value": "PySUT", "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1195, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "get_sut_name_duration_ns", "value": 364, "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1196, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "qsl_name", "value": "PyQSL", "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1197, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "qsl_reported_total_count", "value": 1633, "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1198, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "qsl_reported_performance_count", "value": 1633, "time_ms": 0.023207, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "loadgen.cc", "line_no": 1199, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_scenario", "value": "Offline", "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 272, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_test_mode", "value": "PerformanceOnly", "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 273, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_offline_expected_qps", "value": 1, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 310, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_min_duration_ms", "value": 600000, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 316, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_max_duration_ms", "value": 0, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 317, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_min_query_count", "value": 1633, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 318, "pid": 147525, "tid": 147525}} :::MLLOG {"key": "requested_max_query_count", "value": 0, "time_ms": 0.029172, "namespace": "mlperf::logging", "event_type": "POINT_IN_TIME", "metadata": {"is_error": false, "is_warning": false, "file": "test_settings_internal.cc", "line_no": 319, "pid": 147525, "tid": 147525}} @@@
*** If need more please let me know. Thanks!
Initial part of the console log:
(mlperf) [email protected]:/work/build/inference/speech2text$ head -f -n 500 mlper_loadgen_response.log head: invalid option -- 'f' Try 'head --help' for more information. (mlperf) [email protected]:/work/build/inference/speech2text$ head -h head: invalid option -- 'h' Try 'head --help' for more information. (mlperf) [email protected]:/work/build/inference/speech2text$ head --help Usage: head [OPTION]... [FILE]... Print the first 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name.
With no FILE, or when FILE is -, read standard input.
Mandatory arguments to long options are mandatory for short options too. -c, --bytes=[-]NUM print the first NUM bytes of each file; with the leading '-', print all but the last NUM bytes of each file -n, --lines=[-]NUM print the first NUM lines instead of the first 10; with the leading '-', print all but the last NUM lines of each file -q, --quiet, --silent never print headers giving file names -v, --verbose always print headers giving file names -z, --zero-terminated line delimiter is NUL, not newline --help display this help and exit --version output version information and exit
NUM may have a multiplier suffix: b 512, kB 1000, K 1024, MB 10001000, M 10241024, GB 100010001000, G 102410241024, and so on for T, P, E, Z, Y, R, Q. Binary prefixes can be used, too: KiB=K, MiB=M, and so on.
GNU coreutils online help: https://www.gnu.org/software/coreutils/ Report any translation bugs to https://translationproject.org/team/ Full documentation https://www.gnu.org/software/coreutils/head or available locally via: info '(coreutils) head invocation' (mlperf) [email protected]:/work/build/inference/speech2text$ head -n 500 mlper_loadgen_response.log Time Start: 1753816338 CORES_PER_INST: 32 NUM_INSTS: 2 START_CORES: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127 INFO 07-29 19:12:22 [init.py:244] Automatically detected platform cuda. Namespace(scenario='Offline', accuracy=False, mlperf_conf='mlperf.conf', user_conf='user.conf', audit_conf='audit.conf', dataset_dir='/work', model_path='openai/whisper-large-v3', manifest='/work//data/dev-all-repack.json', perf_count=None, log_dir='/work/run_output', num_workers=2) Dataset loaded with 10.91 hours. Filtered 0.00 hours. Number of samples: 1633 Binding rank 0 to nodes (0,) Binding rank 0 to cores (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31) Dataset loaded with 10.91 hours. Filtered 0.00 hours. Number of samples: 1633 Binding rank 1 to nodes (1,) Binding rank 1 to cores (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33) Dataset loaded with 10.91 hours. Filtered 0.00 hours. Number of samples: 1633 pool size 8 Precision: bfloat16 Worker 1: Setting CUDA_VISIBLE_DEVICES=1 INFO 07-29 19:12:39 [config.py:841] This model supports multiple tasks: {'reward', 'generate', 'transcription', 'classify', 'embed'}. Defaulting to 'transcription'. WARNING 07-29 19:12:39 [config.py:3371] Casting torch.float16 to torch.bfloat16. INFO 07-29 19:12:39 [config.py:1472] Using max model len 448 WARNING 07-29 19:12:40 [arg_utils.py:1735] ['WhisperForConditionalGeneration'] is not supported by the V1 Engine. Falling back to V0. INFO 07-29 19:12:40 [llm_engine.py:230] Initializing a V0 LLM engine (v0.9.2) with config: model='openai/whisper-large-v3', speculative_config=None, tokenizer='openai/whisper-large-v3', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=448, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-large-v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":64,"local_cache_dir":null}, use_cached_outputs=False, pool size 8 Precision: bfloat16 Worker 0: Setting CUDA_VISIBLE_DEVICES=0 INFO 07-29 19:12:40 [config.py:841] This model supports multiple tasks: {'reward', 'generate', 'transcription', 'classify', 'embed'}. Defaulting to 'transcription'. WARNING 07-29 19:12:40 [config.py:3371] Casting torch.float16 to torch.bfloat16. INFO 07-29 19:12:40 [config.py:1472] Using max model len 448 WARNING 07-29 19:12:40 [arg_utils.py:1735] ['WhisperForConditionalGeneration'] is not supported by the V1 Engine. Falling back to V0. INFO 07-29 19:12:40 [cuda.py:363] Using Flash Attention backend. INFO 07-29 19:12:41 [llm_engine.py:230] Initializing a V0 LLM engine (v0.9.2) with config: model='openai/whisper-large-v3', speculative_config=None, tokenizer='openai/whisper-large-v3', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=448, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-large-v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":64,"local_cache_dir":null}, use_cached_outputs=False, INFO 07-29 19:12:42 [cuda.py:363] Using Flash Attention backend. INFO 07-29 19:12:42 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 07-29 19:12:42 [model_runner.py:1171] Starting to load model openai/whisper-large-v3... INFO 07-29 19:12:42 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 07-29 19:12:42 [model_runner.py:1171] Starting to load model openai/whisper-large-v3... INFO 07-29 19:12:43 [weight_utils.py:292] Using model weights format ['.safetensors'] INFO 07-29 19:12:43 [weight_utils.py:345] No model.safetensors.index.json found in remote. Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:00<00:00, 8.22it/s] INFO 07-29 19:12:43 [weight_utils.py:292] Using model weights format ['.safetensors'] INFO 07-29 19:12:43 [weight_utils.py:345] No model.safetensors.index.json found in remote. Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:01<00:00, 1.43it/s] Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:01<00:03, 1.60s/it] Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:07<00:04, 4.07s/it] Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 3.56s/it] Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 2.73s/it]
INFO 07-29 19:12:51 [default_loader.py:272] Loading weights took 8.29 seconds Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 2.61s/it] Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:08<00:00, 2.76s/it]
INFO 07-29 19:12:52 [default_loader.py:272] Loading weights took 8.37 seconds
INFO 07-29 19:12:52 [model_runner.py:1203] Model loading took 2.8764 GiB and 9.005007 seconds
INFO 07-29 19:12:52 [model_runner.py:1203] Model loading took 2.8764 GiB and 9.121790 seconds
INFO 07-29 19:12:53 [enc_dec_model_runner.py:315] Starting profile run for multi-modal models.
WARNING 07-29 19:12:53 [registry.py:183] WhisperProcessor did not return BatchFeature. Make sure to match the behaviour of ProcessorMixin when implementing custom processors.
INFO 07-29 19:12:53 [enc_dec_model_runner.py:315] Starting profile run for multi-modal models.
WARNING 07-29 19:12:53 [registry.py:183] WhisperProcessor did not return BatchFeature. Make sure to match the behaviour of ProcessorMixin when implementing custom processors.
INFO 07-29 19:12:55 [worker.py:294] Memory profiling takes 3.01 seconds
INFO 07-29 19:12:55 [worker.py:294] the current vLLM instance can use total_gpu_memory (39.49GiB) x gpu_memory_utilization (0.80) = 31.60GiB
INFO 07-29 19:12:55 [worker.py:294] model weights take 2.88GiB; non_torch_memory takes 0.09GiB; PyTorch activation peak memory takes 2.62GiB; the rest of the memory reserved for KV Cache is 26.00GiB.
INFO 07-29 19:12:55 [executor_base.py:113] # cuda blocks: 10650, # CPU blocks: 1638
INFO 07-29 19:12:55 [executor_base.py:118] Maximum concurrency for 448 tokens per request: 380.36x
INFO 07-29 19:12:55 [worker.py:294] Memory profiling takes 2.91 seconds
INFO 07-29 19:12:55 [worker.py:294] the current vLLM instance can use total_gpu_memory (39.49GiB) x gpu_memory_utilization (0.80) = 31.60GiB
INFO 07-29 19:12:55 [worker.py:294] model weights take 2.88GiB; non_torch_memory takes 0.09GiB; PyTorch activation peak memory takes 2.62GiB; the rest of the memory reserved for KV Cache is 26.00GiB.
INFO 07-29 19:12:56 [executor_base.py:113] # cuda blocks: 10650, # CPU blocks: 1638
INFO 07-29 19:12:56 [executor_base.py:118] Maximum concurrency for 448 tokens per request: 380.36x
INFO 07-29 19:12:57 [model_runner.py:1513] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
Capturing CUDA graph shapes: 0%| | 0/11 [00:00<?, ?it/s]INFO 07-29 19:12:57 [model_runner.py:1513] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|██████████| 11/11 [00:06<00:00, 1.63it/s]
INFO 07-29 19:13:04 [model_runner.py:1671] Graph capturing finished in 7 secs, took 0.12 GiB
Capturing CUDA graph shapes: 73%|███████▎ | 8/11 [00:06<00:02, 1.13it/s]INFO 07-29 19:13:04 [llm_engine.py:428] init engine (profile, create kv cache, warmup model) took 11.96 seconds
Capturing CUDA graph shapes: 100%|██████████| 11/11 [00:08<00:00, 1.22it/s]
INFO 07-29 19:13:06 [model_runner.py:1671] Graph capturing finished in 9 secs, took 0.12 GiB
INFO 07-29 19:13:06 [llm_engine.py:428] init engine (profile, create kv cache, warmup model) took 14.37 seconds
INFO:SUT:Starting Loadgen response thread
Adding requests: 100%|██████████| 1/1 [00:01<00:00, 1.71s/it]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.07s/it, est. speed input: 3.75 toks/s, output: 62.83 toks/s]
Sample number: 0 | Step time 2.778s
Finished 224717888
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 165.12it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.35 toks/s, output: 64.43 toks/s]
Sample number: 1 | Step time 1.202s
Finished 224717920
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.78it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.27s/it, est. speed input: 3.16 toks/s, output: 64.82 toks/s]
Sample number: 2 | Step time 1.271s
Finished 224717952
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 213.91it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.08s/it, est. speed input: 3.69 toks/s, output: 64.65 toks/s]
Sample number: 3 | Step time 1.088s
Finished 224717984
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.63it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.21s/it, est. speed input: 3.32 toks/s, output: 64.65 toks/s]
Sample number: 4 | Step time 1.213s
Finished 224718016
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.80it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.16 toks/s, output: 64.85 toks/s]
Sample number: 5 | Step time 1.270s
Finished 224718048
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.63it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.18s/it, est. speed input: 3.40 toks/s, output: 64.65 toks/s]
Sample number: 6 | Step time 1.181s
Finished 224718080
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 215.81it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.38it/s, est. speed input: 5.51 toks/s, output: 63.37 toks/s]
Sample number: 7 | Step time 0.731s
Finished 224718112
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 210.03it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.16it/s, est. speed input: 4.64 toks/s, output: 63.81 toks/s]
Sample number: 8 | Step time 0.867s
Finished 224718144
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 261.83it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 2.05it/s, est. speed input: 8.19 toks/s, output: 61.41 toks/s]
Sample number: 9 | Step time 0.493s
Finished 224718176
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 216.55it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.37it/s, est. speed input: 5.49 toks/s, output: 63.16 toks/s]
Sample number: 10 | Step time 0.734s
Finished 224718208
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 183.75it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.16 toks/s, output: 64.85 toks/s]
Sample number: 11 | Step time 1.271s
Finished 224718240
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 233.55it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.57it/s, est. speed input: 6.27 toks/s, output: 62.71 toks/s]
Sample number: 12 | Step time 0.643s
Finished 224718272
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.45it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.06s/it, est. speed input: 3.78 toks/s, output: 64.25 toks/s]
Sample number: 13 | Step time 1.064s
Finished 224718304
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.40it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.13s/it, est. speed input: 3.53 toks/s, output: 64.41 toks/s]
Sample number: 14 | Step time 1.139s
Finished 224718336
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 222.21it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.00s/it, est. speed input: 4.00 toks/s, output: 64.00 toks/s]
Sample number: 15 | Step time 1.005s
Finished 224718368
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.00it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.24s/it, est. speed input: 3.22 toks/s, output: 64.32 toks/s]
Sample number: 16 | Step time 1.249s
Finished 224718400
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 190.75it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.08s/it, est. speed input: 3.71 toks/s, output: 64.06 toks/s]
Sample number: 17 | Step time 1.083s
Finished 224718432
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 215.14it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.29it/s, est. speed input: 5.15 toks/s, output: 63.12 toks/s]
Sample number: 18 | Step time 0.782s
Finished 224718464
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.96it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.48s/it, est. speed input: 2.70 toks/s, output: 64.73 toks/s]
Sample number: 19 | Step time 1.489s
Finished 224718496
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 219.08it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.22it/s, est. speed input: 4.86 toks/s, output: 63.21 toks/s]
Sample number: 20 | Step time 0.828s
Finished 224718528
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 182.54it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.10 toks/s, output: 64.39 toks/s]
Sample number: 21 | Step time 1.295s
Finished 224718560
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.83it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.32s/it, est. speed input: 3.03 toks/s, output: 64.44 toks/s]
Sample number: 22 | Step time 1.325s
Finished 224718592
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 227.31it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.59it/s, est. speed input: 6.36 toks/s, output: 62.04 toks/s]
Sample number: 23 | Step time 0.634s
Finished 224718624
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 184.95it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.36s/it, est. speed input: 2.94 toks/s, output: 64.60 toks/s]
Sample number: 24 | Step time 1.368s
Finished 224718656
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.86it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.49 toks/s]
Sample number: 25 | Step time 1.293s
Finished 224718688
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.71it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.24s/it, est. speed input: 3.22 toks/s, output: 64.44 toks/s]
Sample number: 26 | Step time 1.247s
Finished 224718720
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.27it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.23s/it, est. speed input: 3.26 toks/s, output: 64.34 toks/s]
Sample number: 27 | Step time 1.234s
Finished 224718752
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.26it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.44s/it, est. speed input: 2.79 toks/s, output: 64.75 toks/s]
Sample number: 28 | Step time 1.442s
Finished 224718784
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.22it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.19 toks/s, output: 64.55 toks/s]
Sample number: 29 | Step time 1.261s
Finished 224718816
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.25it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.14s/it, est. speed input: 3.52 toks/s, output: 64.30 toks/s]
Sample number: 30 | Step time 1.141s
Finished 224718848
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.21it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.15s/it, est. speed input: 3.48 toks/s, output: 64.32 toks/s]
Sample number: 31 | Step time 1.156s
Finished 224718880
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.69it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.35s/it, est. speed input: 2.97 toks/s, output: 64.59 toks/s]
Sample number: 32 | Step time 1.353s
Finished 224718912
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.83it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.35 toks/s, output: 64.42 toks/s]
Sample number: 33 | Step time 1.201s
Finished 224718944
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.28it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.41s/it, est. speed input: 2.84 toks/s, output: 64.71 toks/s]
Sample number: 34 | Step time 1.412s
Finished 224718976
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.16it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.44s/it, est. speed input: 2.78 toks/s, output: 64.70 toks/s]
Sample number: 35 | Step time 1.443s
Finished 224719008
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.84it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.63s/it, est. speed input: 2.45 toks/s, output: 64.95 toks/s]
Sample number: 36 | Step time 1.638s
Finished 224719040
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 209.39it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.09s/it, est. speed input: 3.67 toks/s, output: 64.25 toks/s]
Sample number: 37 | Step time 1.095s
Finished 224719072
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 202.71it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.14it/s, est. speed input: 4.55 toks/s, output: 63.63 toks/s]
Sample number: 38 | Step time 0.886s
Finished 224719104
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 251.61it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 2.59it/s, est. speed input: 10.37 toks/s, output: 59.63 toks/s]
Sample number: 39 | Step time 0.390s
Finished 224719136
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 180.27it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.33s/it, est. speed input: 3.01 toks/s, output: 64.66 toks/s]
Sample number: 40 | Step time 1.336s
Finished 224719168
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 197.82it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.47s/it, est. speed input: 2.73 toks/s, output: 64.80 toks/s]
Sample number: 41 | Step time 1.472s
Finished 224719200
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 216.64it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.31it/s, est. speed input: 5.25 toks/s, output: 62.95 toks/s]
Sample number: 42 | Step time 0.768s
Finished 224719232
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.48it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.45s/it, est. speed input: 2.76 toks/s, output: 64.86 toks/s]
Sample number: 43 | Step time 1.455s
Finished 224719264
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 244.37it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.72it/s, est. speed input: 6.90 toks/s, output: 62.09 toks/s]
Sample number: 44 | Step time 0.585s
Finished 224719296
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.24it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.34 toks/s, output: 64.33 toks/s]
Sample number: 45 | Step time 1.203s
Finished 224719328
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 242.54it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.87it/s, est. speed input: 7.46 toks/s, output: 61.57 toks/s]
Sample number: 46 | Step time 0.541s
Finished 224719360
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 181.81it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.39s/it, est. speed input: 2.88 toks/s, output: 64.77 toks/s]
Sample number: 47 | Step time 1.396s
Finished 224719392
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.01it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.58s/it, est. speed input: 2.54 toks/s, output: 64.74 toks/s]
Sample number: 48 | Step time 1.582s
Finished 224719424
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 209.35it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.06s/it, est. speed input: 3.78 toks/s, output: 64.18 toks/s]
Sample number: 49 | Step time 1.065s
Finished 224719456
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.17it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.15s/it, est. speed input: 3.48 toks/s, output: 64.30 toks/s]
Sample number: 50 | Step time 1.157s
Finished 224719488
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 205.24it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.20it/s, est. speed input: 4.79 toks/s, output: 63.49 toks/s]
Sample number: 51 | Step time 0.840s
Finished 224719520
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.82it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.70s/it, est. speed input: 2.35 toks/s, output: 65.11 toks/s]
Sample number: 52 | Step time 1.711s
Finished 224719552
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.31it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.35s/it, est. speed input: 2.97 toks/s, output: 64.66 toks/s]
Sample number: 53 | Step time 1.351s
Finished 224719584
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.24it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.28s/it, est. speed input: 3.11 toks/s, output: 64.63 toks/s]
Sample number: 54 | Step time 1.290s
Finished 224719616
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.37it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.20it/s, est. speed input: 4.79 toks/s, output: 63.41 toks/s]
Sample number: 55 | Step time 0.842s
Finished 224719648
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.79it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.24s/it, est. speed input: 3.22 toks/s, output: 64.47 toks/s]
Sample number: 56 | Step time 1.247s
Finished 224719680
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.98it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.13s/it, est. speed input: 3.53 toks/s, output: 64.34 toks/s]
Sample number: 57 | Step time 1.141s
Finished 224719712
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 224.61it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.06it/s, est. speed input: 4.26 toks/s, output: 63.83 toks/s]
Sample number: 58 | Step time 0.945s
Finished 224719744
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.56it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.00it/s, est. speed input: 4.00 toks/s, output: 64.01 toks/s]
Sample number: 59 | Step time 1.006s
Finished 224719776
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 205.65it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.18it/s, est. speed input: 4.70 toks/s, output: 63.48 toks/s]
Sample number: 60 | Step time 0.856s
Finished 224719808
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 184.71it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.56s/it, est. speed input: 2.57 toks/s, output: 64.85 toks/s]
Sample number: 61 | Step time 1.563s
Finished 224719840
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 198.17it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.47s/it, est. speed input: 2.73 toks/s, output: 64.78 toks/s]
Sample number: 62 | Step time 1.472s
Finished 224719872
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 195.62it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.13s/it, est. speed input: 3.53 toks/s, output: 64.33 toks/s]
Sample number: 63 | Step time 1.140s
Finished 224719904
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 201.72it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.01it/s, est. speed input: 4.06 toks/s, output: 63.90 toks/s]
Sample number: 64 | Step time 0.992s
Finished 224719936
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 185.35it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.44s/it, est. speed input: 2.79 toks/s, output: 64.75 toks/s]
Sample number: 65 | Step time 1.442s
Finished 224719968
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 190.16it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.59s/it, est. speed input: 2.52 toks/s, output: 64.94 toks/s]
Sample number: 66 | Step time 1.592s
Finished 224720000
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.19it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.20s/it, est. speed input: 3.35 toks/s, output: 64.41 toks/s]
Sample number: 67 | Step time 1.201s
Finished 224720032
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.58it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.27s/it, est. speed input: 3.15 toks/s, output: 64.56 toks/s]
Sample number: 68 | Step time 1.276s
Finished 224720064
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 214.59it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.19it/s, est. speed input: 4.78 toks/s, output: 63.33 toks/s]
Sample number: 69 | Step time 0.842s
Finished 224720096
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.73it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.02s/it, est. speed input: 3.94 toks/s, output: 64.00 toks/s]
Sample number: 70 | Step time 1.021s
Finished 224720128
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.17it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.32s/it, est. speed input: 3.04 toks/s, output: 64.59 toks/s]
Sample number: 71 | Step time 1.322s
Finished 224720160
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.34it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.52s/it, est. speed input: 2.62 toks/s, output: 64.94 toks/s]
Sample number: 72 | Step time 1.530s
Finished 224720192
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 213.15it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.16it/s, est. speed input: 4.63 toks/s, output: 63.66 toks/s]
Sample number: 73 | Step time 0.869s
Finished 224720224
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.16it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.09s/it, est. speed input: 3.67 toks/s, output: 64.14 toks/s]
Sample number: 74 | Step time 1.097s
Finished 224720256
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 197.65it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.11s/it, est. speed input: 3.62 toks/s, output: 64.20 toks/s]
Sample number: 75 | Step time 1.112s
Finished 224720288
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.91it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.56s/it, est. speed input: 2.57 toks/s, output: 64.95 toks/s]
Sample number: 76 | Step time 1.561s
Finished 224720320
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.83it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.28s/it, est. speed input: 3.12 toks/s, output: 64.64 toks/s]
Sample number: 77 | Step time 1.290s
Finished 224720352
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 206.25it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.18s/it, est. speed input: 3.39 toks/s, output: 64.41 toks/s]
Sample number: 78 | Step time 1.185s
Finished 224720384
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 190.01it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.56 toks/s]
Sample number: 79 | Step time 1.292s
Finished 224720416
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.13it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.12s/it, est. speed input: 3.57 toks/s, output: 64.32 toks/s]
Sample number: 80 | Step time 1.125s
Finished 224720448
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.42it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.76s/it, est. speed input: 2.27 toks/s, output: 65.19 toks/s]
Sample number: 81 | Step time 1.770s
Finished 224720480
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.57it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.06s/it, est. speed input: 3.78 toks/s, output: 64.20 toks/s]
Sample number: 82 | Step time 1.065s
Finished 224720512
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.22it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.50 toks/s]
Sample number: 83 | Step time 1.293s
Finished 224720544
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.00it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.18 toks/s, output: 64.49 toks/s]
Sample number: 84 | Step time 1.262s
Finished 224720576
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.91it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.19 toks/s, output: 64.51 toks/s]
Sample number: 85 | Step time 1.261s
Finished 224720608
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 194.69it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.01s/it, est. speed input: 3.94 toks/s, output: 64.09 toks/s]
Sample number: 86 | Step time 1.020s
Finished 224720640
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 192.11it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.29s/it, est. speed input: 3.11 toks/s, output: 64.58 toks/s]
Sample number: 87 | Step time 1.291s
Finished 224720672
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.13it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.62s/it, est. speed input: 2.48 toks/s, output: 64.97 toks/s]
Sample number: 88 | Step time 1.622s
Finished 224720704
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.13it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.32s/it, est. speed input: 3.04 toks/s, output: 64.59 toks/s]
Sample number: 89 | Step time 1.322s
Finished 224720736
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 193.79it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.36s/it, est. speed input: 2.94 toks/s, output: 64.66 toks/s]
Sample number: 90 | Step time 1.367s
Finished 224720768
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 199.90it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.22s/it, est. speed input: 3.27 toks/s, output: 64.53 toks/s]
Sample number: 91 | Step time 1.230s
Finished 224720800
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 187.21it/s]
Processed prompts: 100%|██████████| 1/1 [00:02<00:00, 2.22s/it, est. speed input: 1.80 toks/s, output: 65.41 toks/s]
Sample number: 92 | Step time 2.223s
Finished 224720832
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 188.45it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.45s/it, est. speed input: 2.75 toks/s, output: 64.70 toks/s]
Sample number: 93 | Step time 1.459s
Finished 224720864
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 191.28it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.02s/it, est. speed input: 3.94 toks/s, output: 64.03 toks/s]
Sample number: 94 | Step time 1.021s
Finished 224720896
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 196.20it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.33s/it, est. speed input: 3.01 toks/s, output: 64.64 toks/s]
Sample number: 95 | Step time 1.336s
Finished 224720928
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 206.63it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.01it/s, est. speed input: 4.06 toks/s, output: 63.88 toks/s]
Sample number: 96 | Step time 0.992s
Finished 224720960
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.78it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.34it/s, est. speed input: 5.37 toks/s, output: 63.08 toks/s]
Sample number: 97 | Step time 0.751s
Finished 224720992
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 227.48it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.22it/s, est. speed input: 4.87 toks/s, output: 63.35 toks/s]
Sample number: 98 | Step time 0.826s
Finished 224721024
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 182.99it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.21s/it, est. speed input: 3.31 toks/s, output: 64.48 toks/s]
Sample number: 99 | Step time 1.216s
Finished 224721056
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.28it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.57s/it, est. speed input: 2.55 toks/s, output: 64.97 toks/s]
Sample number: 100 | Step time 1.576s
Finished 224721088
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 232.47it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 1.64it/s, est. speed input: 6.55 toks/s, output: 62.19 toks/s]
Sample number: 101 | Step time 0.616s
Finished 224721120
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 184.43it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.39s/it, est. speed input: 2.87 toks/s, output: 64.65 toks/s]
Sample number: 102 | Step time 1.398s
Finished 224721152
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 189.65it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.53s/it, est. speed input: 2.62 toks/s, output: 64.81 toks/s]
Sample number: 103 | Step time 1.534s
Finished 224721184
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 186.69it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.26s/it, est. speed input: 3.18 toks/s, output: 64.48 toks/s]
Sample number: 104 | Step time 1.262s
Finished 224721216
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 204.09it/s]
Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.27s/it, est. speed input: 3.15 toks/s, output: 64.52 toks/s]
(mlperf) [email protected]:/work/build/inference/speech2text$
I believe this PR would have fixed this issue.
Hi Arjun,
Yes, that fixed the issue. Thanks!
Two questions about running "sh ./reference_mlperf_accuracy.sh". The test bed has two A100 gpus. First of all, most of the time only gpu-0 got utilized, and it ran at low utilization, about 34%. Moreover, gpu-1 never got utilized. So, are there ways to fix these two issues ? If you want me to open another ticket, please let me know.
Thanks! Jean
@jhsiao1948 Reference implementations are typically not optimised. This implies low utilisation of resources. Submitters usually create optimised implementations for their submissions. You can expect to find well optimised implementations when the v5.1 results are published in mid September. Having said that, A100 GPUs are pretty old by now, so your mileage may vary.