openvino.genai icon indicating copy to clipboard operation
openvino.genai copied to clipboard

if there is a sample code to run on NPU?

Open Edward-Lin opened this issue 1 year ago • 24 comments

where can I please get the sample code to run LLM on NPU? Thanks,

Edward-Lin avatar Jul 05 '24 05:07 Edward-Lin

https://github.com/openvinotoolkit/openvino.genai/pull/576

Wovchena avatar Jul 05 '24 09:07 Wovchena

#576

what do you mean?

Edward-Lin avatar Jul 06 '24 09:07 Edward-Lin

I did not know what u mean image

Edward-Lin avatar Jul 06 '24 09:07 Edward-Lin

@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.

avitial avatar Jul 09 '24 17:07 avitial

@Edward-Lin the referenced PR (not merged to master branch yet), seems to contain code changes to support LLM pipeline with static shapes out-of-the-box for NPU plugin. Feel free to try the PR to run the chat_sample , or wait until merge to master branch. Hope this helps.

Thanks, BTW, what is PR, and how can I get it?

aoke79 avatar Jul 10 '24 01:07 aoke79

I think I've got it, and have a try. will update later. Thanks,

aoke79 avatar Jul 10 '24 01:07 aoke79

https://github.com/TolyaTalamanov/openvino.genai/tree/at/static-llm-pipeline-out-of-the-box I've checked out the code, but it only support CPP version, and I need to try to compile it, and not sure if it can run on NPU or not? but from the 576, it should not work yet.

image

aoke79 avatar Jul 10 '24 01:07 aoke79

Hi! @aoke79,

Once https://github.com/openvinotoolkit/openvino.genai/pull/576/ is merged, you will be able to run LLMPipeline on NPU out-of-the-box by using the following code shippet:

ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Unfortunately, it doesn't support chat mode (will be introduced there: https://github.com/openvinotoolkit/openvino.genai/pull/580), so chat_sample.cpp cannot be used so far.

TolyaTalamanov avatar Jul 11 '24 09:07 TolyaTalamanov

Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks

aoke79 avatar Jul 12 '24 07:07 aoke79

do you please have the converted tiny llama chat model, which I can try? I suppose that the model, I converted on my side, is not correct. thanks,

Edward-Lin avatar Jul 15 '24 05:07 Edward-Lin

can anyone update?

aoke79 avatar Jul 18 '24 11:07 aoke79

Thanks, Tolya, I updated code, and found 576 was merged, but it seems not work for me. when I changed the "device = 'CPU' # GPU can be used as well" to "GPU" or "NPU", in greedy_causal_lm.py or beam_search_causal_lm.py, neither of them worked. please help check. Thanks

Hi @aoke79, @Edward-Lin the following snippet should work with regular openvino model (same as for other plugins)

ov::genai::LLMPipeline pipe(model_path, "NPU");
ov::genai::GenerationConfig config;
config.max_new_tokens = 100; // optional
std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Additionally, I'd also expect chat_sample to work with the latest master.

As for greedy_causal_lm.py and beam_search_causal_lm.py -- they weren't considered during integration, perhaps will be enabled in the future.

TolyaTalamanov avatar Jul 19 '24 12:07 TolyaTalamanov

I've run it on GPU OK through the GenAI. Might you please share the converted models on NPU, like TinyLlama-1.1B-Chat-v1.0. I run the same code with the same model, failed to run on NPU, but ok on GPU. Thanks a lot,

aoke79 avatar Jul 22 '24 03:07 aoke79

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

do you please know where I can download a NPU compatible model? Thanks,

aoke79 avatar Jul 22 '24 04:07 aoke79

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

(env_ov_genai) c:\Users\Administrator\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release>chat_sample.exe C:\AIGC\openvino\Optimum\tiny-int4-sym-npu

-----main----- question: why the Sun is yellow? Check 'num_inputs == 4 || num_inputs == 3' failed at src/cpp/src/llm_pipeline.cpp:207: Model should have 3 or 4 inputs: either (input_ids, attention_mask, beam_idx) or (input_ids, attention_mask, position_ids, beam_idx) but you have '47' inputs

do you please know where I can download a NPU compatible model? Thanks,

@aoke79 Hi! You don't need NPU compatible models, you should be able to use the same models as for CPU / GPU devices. Please make sure, your GenAI version contains the following change: https://github.com/openvinotoolkit/openvino.genai/pull/576

TolyaTalamanov avatar Jul 23 '24 08:07 TolyaTalamanov

thanks for response. but I don't think so, here is my git log, I suppose it support 579: commit 0c2b68e469008fcc33f53da884d3e3b87df8dad0 (HEAD -> master, origin/master, origin/HEAD) Author: Zlobin Vladimir [email protected] Date: Fri Jul 19 13:27:33 2024 +0400

rm .github/ISSUE_TEMPLATE (#646)

GenAI issues found by the commpunity tend to be crated using that
template which isn't correct because they usually expect us to address
them.

commit fcc309ef00ef0020a8a93bf1f7e08664eb6d2bcb (origin/gh-readonly-queue/master/pr-643-7f5e8d293468754e148c274533b7c3e790b78198) Author: Pavel Esir [email protected] Date: Thu Jul 18 11:30:31 2024 +0200

add testing chat_templates for models from continuous batching (#643)

and added missing chat_templates for models from
https://github.com/ilya-lavrenov/openvino.genai/blob/ct-beam-search/text_generation/causal_lm/cpp/continuous_batching/python/tests/models/real_models

Missing models were:
`mistralai/Mistral-7B-Instruct-v0.1`
`microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct` -
same templates
`THUDM/chatglm3-6b`

Mistral will be added separately. Also increased priority to enable
apply_chat_template firstly for CB models from the list above.

aoke79 avatar Jul 23 '24 09:07 aoke79

one more thing, I found I'm not able to compile the cpp code based-on this genai code base, here is the logs: (env_ov_genai) C:\AIGC\openvino\openvino.genai\samples\cpp>build_samples_msvc.bat -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.22631. -- The C compiler identification is MSVC 19.39.33523.0 -- The CXX compiler identification is MSVC 19.39.33523.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done CMake Error at CMakeLists.txt:124 (add_subdirectory): add_subdirectory given source "common" which is not an existing directory.

CMake Error at beam_search_causal_lm/CMakeLists.txt:6 (find_package): Could not find a package configuration file provided by "OpenVINOGenAI" with any of the following names:

OpenVINOGenAIConfig.cmake
openvinogenai-config.cmake

Add the installation prefix of "OpenVINOGenAI" to CMAKE_PREFIX_PATH or set "OpenVINOGenAI_DIR" to a directory containing one of the above files. If "OpenVINOGenAI" provides a separate development package or SDK, be sure it has been installed.

-- Configuring incomplete, errors occurred! Error

aoke79 avatar Jul 23 '24 09:07 aoke79

I downloaded a packaged from genai official website, and it can compile the cpp code, but I don't know how to compile genai.cpp code. can you please tell me how to compile the genai cpp code? is there any guide? "openvino_genai_windows_2024.2.0.0_x86_64.zip"

aoke79 avatar Jul 23 '24 09:07 aoke79

I downloaded a packaged from genai official website, and it can compile the cpp code, but I don't know how to compile genai.cpp code. can you please tell me how to compile the genai cpp code? is there any guide? "openvino_genai_windows_2024.2.0.0_x86_64.zip"

@Wovchena FYI

TolyaTalamanov avatar Jul 23 '24 09:07 TolyaTalamanov

thanks for response. but I don't think so, here is my git log, I suppose it support 579: commit 0c2b68e (HEAD -> master, origin/master, origin/HEAD) Author: Zlobin Vladimir [email protected] Date: Fri Jul 19 13:27:33 2024 +0400

rm .github/ISSUE_TEMPLATE (#646)

GenAI issues found by the commpunity tend to be crated using that
template which isn't correct because they usually expect us to address
them.

commit fcc309e (origin/gh-readonly-queue/master/pr-643-7f5e8d293468754e148c274533b7c3e790b78198) Author: Pavel Esir [email protected] Date: Thu Jul 18 11:30:31 2024 +0200

add testing chat_templates for models from continuous batching (#643)

and added missing chat_templates for models from
https://github.com/ilya-lavrenov/openvino.genai/blob/ct-beam-search/text_generation/causal_lm/cpp/continuous_batching/python/tests/models/real_models

Missing models were:
`mistralai/Mistral-7B-Instruct-v0.1`
`microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct` -
same templates
`THUDM/chatglm3-6b`

Mistral will be added separately. Also increased priority to enable
apply_chat_template firstly for CB models from the list above.

@aoke79 Could you clarify what model are you using and GenAI code snippet, please? Would be great if you could provide optimum-cli command line so I could generate the model on my side

TolyaTalamanov avatar Jul 23 '24 09:07 TolyaTalamanov

Dear, maybe there is mis-understanding here. First of all, I tried to use the code you've provided to me, like below. I believed it's a C++ code, but I don't know how to compile it. can you please show me how to compile it?

ov::genai::LLMPipeline pipe(model_path, "NPU"); ov::genai::GenerationConfig config; config.max_new_tokens = 100; // optional std::cout << pipe.generate("Why is the Sun yellow?", config) << std::endl;

Secondly, I downloaded a genai package, which have the "build_samples_msvc.bat" file to compile the C++ code, I've got the exe files through it. but it still failed to load model on NPU. so I guess that:

  1. maybe the code is not the latest, so I've asked for a compiling way to compile the sample code through sample code.
  2. maybe, I need some specific converted model for NPU, so I asked for the model if you can provided one.

so might you please share some completely steps, which I can follow to do the test, not just some suggestions?

Thanks very much

aoke79 avatar Jul 24 '24 07:07 aoke79

https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0

aoke79 avatar Jul 25 '24 01:07 aoke79

Hi @aoke79 if it helps you can take a look at the following PR for guidance with LLMs on NPU: https://github.com/openvinotoolkit/openvino/pull/25841. Here is the Run LLMs with OpenVINO GenAI Flavor on NPU guide. Hope this helps.

avitial avatar Aug 13 '24 23:08 avitial

@aoke79 There is also step-by-step guide to run benchmark_genai: https://github.com/openvinotoolkit/openvino.genai/issues/834

TolyaTalamanov avatar Sep 10 '24 13:09 TolyaTalamanov