Issue with Running benchmark.py
I am encountering an issue while using the following setup:
Tool: openvino.genai/tools /llm_bench/benchmark.py Device: Intel Meteor Lake Environment: Ubuntu 24.04 / OpenVINO 2024.5.0 / GPU Driver / NPU Driver Model used:
CPU / iGPU: llama-3.1-8b-instruct (INT4) NPU: llama-3-8b-instruct (INT4-NPU)
When running the command, I received the following error message:
python3 benchmark.py -m <path>/llama-3.1-8b-instruct/ -n 2 -d CPU -p "What is large language model (LLM)?" -ic 50
Message:
Run on CPU :
(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct/ -n 2 -d CPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50 The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama-3.1-8b-instruct [ INFO ] OV Config={'CACHE_DIR': ''} [ WARNING ] It is recommended to set the environment variable OMP_WAIT_POLICY to PASSIVE, so that OpenVINO inference can use all CPU resources without waiting. [ INFO ] The num_beams is 1, update Torch thread num from 16 to 8, avoid to use the CPU cores for OpenVINO inference. [ INFO ] Model path=/opt/Advantech/EdgeAISuite/Intel_Standard/GenAI/LLM/model/llama-3.1-8b-instruct, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head [ INFO ] Selected OpenVINO GenAI for benchmarking [ INFO ] Pipeline initialization time: 1.47s [ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0] [ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words [ ERROR ] An exception occurred [ INFO ] Traceback (most recent call last): File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list, File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?
Run on NPU:
(llama3) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)? please reply under 100 words" -ic 50 The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama [ INFO ] OV Config={'CACHE_DIR': ''} [ INFO ] Model path=/home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights, openvino runtime version: 2024.5.0-17288-7975fa5da0c-refs/pull/3856/head [ INFO ] Selected OpenVINO GenAI for benchmarking [ INFO ] Pipeline initialization time: 21.64s [ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 2, prompt nums: 1, prompt idx: [0] [ INFO ] [warm-up][P0] Input text: What is large language model (LLM)? please reply under 100 words [ ERROR ] An exception occurred [ INFO ] Traceback (most recent call last): File "/home/adv/Downloads/openvino.genai/tools/llm_bench/benchmark.py", line 229, in main iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 515, in run_text_generation_benchmark text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list, File "/home/adv/Downloads/openvino.genai/tools/llm_bench/task/text_generation.py", line 305, in run_text_generation_genai inference_durations = (np.array(perf_metrics.raw_metrics.token_infer_durations) / 1000 / 1000).tolist() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'openvino_genai.py_openvino_genai.RawPerfMetrics' object has no attribute 'token_infer_durations'. Did you mean: 'tokenization_durations'?
Could you please help me resolve this issue?
@tim102187S you need to update your openvino-genai package and install it from nightly:
pip install -U --pre --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly openvino_tokenizers openvino openvino-genai
after that, these metircs become available or you can switch to 2024/6 branch if you would like to use llm_bench compatible with latest stable release
Having the same issue and suggestion from @eaidova resolved the issue. Thank you.
I have recently updated my OpenVINO-GenAI package and switched to version 2024/6. While the issue with the CPU has been resolved, I am now encountering a problem when using the NPU.
- Tool:
openvino.genai/tools /llm_bench/benchmark.py - Device: Intel Meteor Lake
- Environment: Ubuntu 24.04 / OpenVINO 2024.5.0 / NPU Driver 1.10.0
- Model used:
- NPU:
llama-3-8b-instruct (INT4-NPU)
- NPU:
When running the following command:
python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)?" -ic 50
Message:
Run on NPU (After approximately 5 minutes of execution) :
(openvino_one) adv@adv-Default-string:~/Downloads/openvino.genai/tools/llm_bench$ python3 benchmark.py -m /home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights/ -n 2 -d NPU -p "What is large language model (LLM)" -ic 50 The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama [ INFO ] OV Config={'CACHE_DIR': ''} [ INFO ] Model path=/home/adv/Downloads/openvino_notebooks/notebooks/llm-chatbot/llama/INT4-NPU_compressed_weights, openvino runtime version: 2025.0.0-17709-688f0428cfc Segmentation fault (core dumped)
@tim102187S I see you listed NPU driver as 1.10.0 version, could you try the latest v1.10.1 version here ? This version contains a fix for LLMs, please give it a go and let us know if issue persists.
Hello @avitial , I have updated my NPU driver to latest v1.10.1 version as suggested, but I am still encountering the same issue. Please let me know if there are any additional steps I should take or further details I can provide to assist in resolving this matter.