openvino.genai
openvino.genai copied to clipboard
if there is a sample code to run on NPU?
where can I please get the sample code to run LLM on NPU? Thanks,
https://github.com/openvinotoolkit/openvino.genai/pull/576
#576
what do you mean?
I did not know what u mean
@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.
@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.
Thanks, BTW, what is PR, and how can I get it?
I think I've got it, and have a try. will update later. Thanks,
https://github.com/TolyaTalamanov/openvino.genai/tree/at/static-llm-pipeline-out-of-the-box I've checked out the code, but it only support CPP version, and I need to try to compile it, and not sure if it can run on NPU or not? but from the 576, it should not work yet.
Hi! @aoke79,
Once https://github.com/openvinotoolkit/openvino.genai/pull/576/ is merged, you will be able to run LLMPipeline on NPU out-of-the-box by using the following code shippet:
ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;
Unfortunately, it doesn't support chat mode (will be introduced there: https://github.com/openvinotoolkit/openvino.genai/pull/580), so chat_sample.cpp cannot be used so far.
Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks
do you please have the converted tiny llama chat model, which I can try? I suppose that the model, I converted on my side, is not correct. thanks,
can anyone update?
Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks
Hi @aoke79, @Edward-Lin the following snippet should work with regular openvino model (same as for other plugins)
ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;
Additionally, I'd also expect chat_sample to work with the latest master.
As for greedy_causal_lm.py and beam_search_causal_lm.py -- they weren't considered during integration, perhaps will be enabled in the future.
I've run it on GPU OK through the GenAI. Might you please share the converted models on NPU, like TinyLlama-1.1B-Chat-v1.0. I run the same code with the same model, failed to run on NPU, but ok on GPU. Thanks a lot,
(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu
-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs
(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu
-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs
do you please know where I can download a NPU compatible model? Thanks,
(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu
-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs
(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu
-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs
do you please know where I can download a NPU compatible model? Thanks,
@aoke79 Hi! You don't need NPU compatible models, you should be able to use the same models as for CPU / GPU devices. Please make sure, your GenAI version contains the following change: https://github.com/openvinotoolkit/openvino.genai/pull/576
thanks for response. but I don't think so, here is my git log, I suppose it support 579: commit 0c2b68e469008fcc33f53da884d3e3b87df8dad0 (HEAD -> master, origin/master, origin/HEAD) Author: Zlobin Vladimir [email protected] Date: Fri Jul 19 13:27:33 2024 +0400
rm .github/ISSUE_TEMPLATE (#646)
GenAI issues found by the commpunity tend to be crated using that
template which isn't correct because they usually expect us to address
them.
commit fcc309ef00ef0020a8a93bf1f7e08664eb6d2bcb (origin/gh-readonly-queue/master/pr-643-7f5e8d293468754e148c274533b7c3e790b78198) Author: Pavel Esir [email protected] Date: Thu Jul 18 11:30:31 2024 +0200
add testing chat_templates for models from continuous batching (#643)
and added missing chat_templates for models from
https://github.com/ilya-lavrenov/openvino.genai/blob/ct-beam-search/text_generation/causal_lm/cpp/continuous_batching/python/tests/models/real_models
Missing models were:
`mistralai/Mistral-7B-Instruct-v0.1`
`microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct` -
same templates
`THUDM/chatglm3-6b`
Mistral will be added separately. Also increased priority to enable
apply_chat_template firstly for CB models from the list above.
one more thing, I found I'm not able to compile the cpp code based-on this genai code base, here is the logs: (env_ov_genai) C:\AIGC\openvino\openvino.genai\samples\cpp>build_samples_msvc.bat -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631. -- The C compiler identification is MSVC 19.39.33523.0 -- The CXX compiler identification is MSVC 19.39.33523.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done CMake Error at CMakeLists.txt:124 (add_subdirectory): add_subdirectory given source "common" which is not an existing directory.
CMake Error at beam_search_causal_lm/CMakeLists.txt:6 (find_package): Could not find a package configuration file provided by "OpenVINOGenAI" with any of the following names:
OpenVINOGenAIConfig.cmake
openvinogenai-config.cmake
Add the installation prefix of "OpenVINOGenAI" to CMAKE_PREFIX_PATH or set "OpenVINOGenAI_DIR" to a directory containing one of the above files. If "OpenVINOGenAI" provides a separate development package or SDK, be sure it has been installed.
-- Configuring incomplete, errors occurred! Error
I downloaded a packaged from genai official website, and it can compile the cpp code, but I don't know how to compile genai.cpp code. can you please tell me how to compile the genai cpp code? is there any guide? "openvino_genai_windows_2024.2.0.0_x86_64.zip"
I downloaded a packaged from genai official website, and it can compile the cpp code, but I don't know how to compile genai.cpp code. can you please tell me how to compile the genai cpp code? is there any guide? "openvino_genai_windows_2024.2.0.0_x86_64.zip"
@Wovchena FYI
thanks for response. but I don't think so, here is my git log, I suppose it support 579: commit 0c2b68e (HEAD -> master, origin/master, origin/HEAD) Author: Zlobin Vladimir [email protected] Date: Fri Jul 19 13:27:33 2024 +0400
rm .github/ISSUE_TEMPLATE (#646) GenAI issues found by the commpunity tend to be crated using that template which isn't correct because they usually expect us to address them.commit fcc309e (origin/gh-readonly-queue/master/pr-643-7f5e8d293468754e148c274533b7c3e790b78198) Author: Pavel Esir [email protected] Date: Thu Jul 18 11:30:31 2024 +0200
add testing chat_templates for models from continuous batching (#643) and added missing chat_templates for models from https://github.com/ilya-lavrenov/openvino.genai/blob/ct-beam-search/text_generation/causal_lm/cpp/continuous_batching/python/tests/models/real_models Missing models were: `mistralai/Mistral-7B-Instruct-v0.1` `microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct` - same templates `THUDM/chatglm3-6b` Mistral will be added separately. Also increased priority to enable apply_chat_template firstly for CB models from the list above.
@aoke79 Could you clarify what model are you using and GenAI code snippet, please?
Would be great if you could provide optimum-cli command line so I could generate the model on my side
Dear, maybe there is mis-understanding here. First of all, I tried to use the code you've provided to me, like below. I believed it's a C++ code, but I don't know how to compile it. can you please show me how to compile it?
ov::genai::LLMPipeline pipe(model_path, "NPU"); ov::genai::GenerationConfig config; config.max_new_tokens = 100; // optional std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;
Secondly, I downloaded a genai package, which have the "build_samples_msvc.bat" file to compile the C++ code, I've got the exe files through it. but it still failed to load model on NPU. so I guess that:
- maybe the code is not the latest, so I've asked for a compiling way to compile the sample code through sample code.
- maybe, I need some specific converted model for NPU, so I asked for the model if you can provided one.
so might you please share some completely steps, which I can follow to do the test, not just some suggestions?
Thanks very much
https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0
Hi @aoke79 if it helps you can take a look at the following PR for guidance with LLMs on NPU: https://github.com/openvinotoolkit/openvino/pull/25841. Here is the Run LLMs with OpenVINO GenAI Flavor on NPU guide. Hope this helps.
@aoke79 There is also step-by-step guide to run benchmark_genai: https://github.com/openvinotoolkit/openvino.genai/issues/834