openvino.genai if there is a sample code to run on NPU?

where can I please get the sample code to run LLM on NPU? Thanks,

Jul 05 '24 05:07 Edward-Lin

https://github.com/openvinotoolkit/openvino.genai/pull/576

Jul 05 '24 09:07 Wovchena

#576

what do you mean?

Jul 06 '24 09:07 Edward-Lin

I did not know what u mean

Jul 06 '24 09:07 Edward-Lin

@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.

Jul 09 '24 17:07 avitial

@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.

Thanks, BTW, what is PR, and how can I get it?

Jul 10 '24 01:07 aoke79

I think I've got it, and have a try. will update later. Thanks,

Jul 10 '24 01:07 aoke79

https://github.com/TolyaTalamanov/openvino.genai/tree/at/static-llm-pipeline-out-of-the-box I've checked out the code, but it only support CPP version, and I need to try to compile it, and not sure if it can run on NPU or not? but from the 576, it should not work yet.

Jul 10 '24 01:07 aoke79

Hi! @aoke79,

Once https://github.com/openvinotoolkit/openvino.genai/pull/576/ is merged, you will be able to run LLMPipeline on NPU out-of-the-box by using the following code shippet:

ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Unfortunately, it doesn't support chat mode (will be introduced there: https://github.com/openvinotoolkit/openvino.genai/pull/580), so chat_sample.cpp cannot be used so far.

Jul 11 '24 09:07 TolyaTalamanov

Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks

Jul 12 '24 07:07 aoke79

do you please have the converted tiny llama chat model, which I can try? I suppose that the model, I converted on my side, is not correct. thanks,

Jul 15 '24 05:07 Edward-Lin

can anyone update?

Jul 18 '24 11:07 aoke79

Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks

Hi @aoke79, @Edward-Lin the following snippet should work with regular openvino model (same as for other plugins)

ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Additionally, I'd also expect chat_sample to work with the latest master.

As for greedy_causal_lm.py and beam_search_causal_lm.py -- they weren't considered during integration, perhaps will be enabled in the future.

Jul 19 '24 12:07 TolyaTalamanov

I've run it on GPU OK through the GenAI. Might you please share the converted models on NPU, like TinyLlama-1.1B-Chat-v1.0. I run the same code with the same model, failed to run on NPU, but ok on GPU. Thanks a lot,

Jul 22 '24 03:07 aoke79

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

do you please know where I can download a NPU compatible model? Thanks,

Jul 22 '24 04:07 aoke79

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

do you please know where I can download a NPU compatible model? Thanks,

@aoke79 Hi! You don't need NPU compatible models, you should be able to use the same models as for CPU / GPU devices. Please make sure, your GenAI version contains the following change: https://github.com/openvinotoolkit/openvino.genai/pull/576

Jul 23 '24 08:07 TolyaTalamanov

thanks for response. but I don't think so, here is my git log, I suppose it support 579: commit 0c2b68e469008fcc33f53da884d3e3b87df8dad0 (HEAD -> master, origin/master, origin/HEAD) Author: Zlobin Vladimir [email protected] Date: Fri Jul 19 13:27:33 2024 +0400

rm .github/ISSUE_TEMPLATE (#646)

GenAI issues found by the commpunity tend to be crated using that
template which isn't correct because they usually expect us to address
them.

commit fcc309ef00ef0020a8a93bf1f7e08664eb6d2bcb (origin/gh-readonly-queue/master/pr-643-7f5e8d293468754e148c274533b7c3e790b78198) Author: Pavel Esir [email protected] Date: Thu Jul 18 11:30:31 2024 +0200

add testing chat_templates for models from continuous batching (#643)

and added missing chat_templates for models from
https://github.com/ilya-lavrenov/openvino.genai/blob/ct-beam-search/text_generation/causal_lm/cpp/continuous_batching/python/tests/models/real_models

Missing models were:
`mistralai/Mistral-7B-Instruct-v0.1`
`microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct` -
same templates
`THUDM/chatglm3-6b`

Mistral will be added separately. Also increased priority to enable
apply_chat_template firstly for CB models from the list above.

Jul 23 '24 09:07 aoke79

one more thing, I found I'm not able to compile the cpp code based-on this genai code base, here is the logs: (env_ov_genai) C:\AIGC\openvino\openvino.genai\samples\cpp>build_samples_msvc.bat -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631. -- The C compiler identification is MSVC 19.39.33523.0 -- The CXX compiler identification is MSVC 19.39.33523.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done CMake Error at CMakeLists.txt:124 (add_subdirectory): add_subdirectory given source "common" which is not an existing directory.

CMake Error at beam_search_causal_lm/CMakeLists.txt:6 (find_package): Could not find a package configuration file provided by "OpenVINOGenAI" with any of the following names:

OpenVINOGenAIConfig.cmake
openvinogenai-config.cmake

Add the installation prefix of "OpenVINOGenAI" to CMAKE_PREFIX_PATH or set "OpenVINOGenAI_DIR" to a directory containing one of the above files. If "OpenVINOGenAI" provides a separate development package or SDK, be sure it has been installed.

-- Configuring incomplete, errors occurred! Error

Jul 23 '24 09:07 aoke79

I downloaded a packaged from genai official website, and it can compile the cpp code, but I don't know how to compile genai.cpp code. can you please tell me how to compile the genai cpp code? is there any guide? "openvino_genai_windows_2024.2.0.0_x86_64.zip"

Jul 23 '24 09:07 aoke79

I downloaded a packaged from genai official website, and it can compile the cpp code, but I don't know how to compile genai.cpp code. can you please tell me how to compile the genai cpp code? is there any guide? "openvino_genai_windows_2024.2.0.0_x86_64.zip"

@Wovchena FYI

Jul 23 '24 09:07 TolyaTalamanov

thanks for response. but I don't think so, here is my git log, I suppose it support 579: commit 0c2b68e (HEAD -> master, origin/master, origin/HEAD) Author: Zlobin Vladimir [email protected] Date: Fri Jul 19 13:27:33 2024 +0400
rm .github/ISSUE_TEMPLATE (#646)

GenAI issues found by the commpunity tend to be crated using that
template which isn't correct because they usually expect us to address
them.
commit fcc309e (origin/gh-readonly-queue/master/pr-643-7f5e8d293468754e148c274533b7c3e790b78198) Author: Pavel Esir [email protected] Date: Thu Jul 18 11:30:31 2024 +0200
add testing chat_templates for models from continuous batching (#643)

and added missing chat_templates for models from
https://github.com/ilya-lavrenov/openvino.genai/blob/ct-beam-search/text_generation/causal_lm/cpp/continuous_batching/python/tests/models/real_models

Missing models were:
`mistralai/Mistral-7B-Instruct-v0.1`
`microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct` -
same templates
`THUDM/chatglm3-6b`

Mistral will be added separately. Also increased priority to enable
apply_chat_template firstly for CB models from the list above.

@aoke79 Could you clarify what model are you using and GenAI code snippet, please? Would be great if you could provide optimum-cli command line so I could generate the model on my side

Jul 23 '24 09:07 TolyaTalamanov

Dear, maybe there is mis-understanding here. First of all, I tried to use the code you've provided to me, like below. I believed it's a C++ code, but I don't know how to compile it. can you please show me how to compile it?

ov::genai::LLMPipeline pipe(model_path, "NPU"); ov::genai::GenerationConfig config; config.max_new_tokens = 100; // optional std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Secondly, I downloaded a genai package, which have the "build_samples_msvc.bat" file to compile the C++ code, I've got the exe files through it. but it still failed to load model on NPU. so I guess that:

maybe the code is not the latest, so I've asked for a compiling way to compile the sample code through sample code.
maybe, I need some specific converted model for NPU, so I asked for the model if you can provided one.

so might you please share some completely steps, which I can follow to do the test, not just some suggestions?

Thanks very much

Jul 24 '24 07:07 aoke79

https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0

Jul 25 '24 01:07 aoke79

Hi @aoke79 if it helps you can take a look at the following PR for guidance with LLMs on NPU: https://github.com/openvinotoolkit/openvino/pull/25841. Here is the Run LLMs with OpenVINO GenAI Flavor on NPU guide. Hope this helps.

Aug 13 '24 23:08 avitial

@aoke79 There is also step-by-step guide to run benchmark_genai: https://github.com/openvinotoolkit/openvino.genai/issues/834

Sep 10 '24 13:09 TolyaTalamanov

openvino.genai openvino.genai copied to clipboard

if there is a sample code to run on NPU?

Dear, maybe there is mis-understanding here. First of all, I tried to use the code you've provided to me, like below. I believed it's a C++ code, but I don't know how to compile it. can you please show me how to compile it?

ov::genai::LLMPipeline pipe(model_path, "NPU"); ov::genai::GenerationConfig config; config.max_new_tokens = 100; // optional std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

openvino.genai
openvino.genai copied to clipboard