openvino.genai Issues with documentation and failure to tokenize

The documentation in the article has the following issues.

The following will not work: optimum-cli export openvino --model meta-llama/Llama-3.2-3B-Instruct --task text-generation-with-past --weight-format int4 --group-size 64 --ratio 1.0 --sym --awq --scale-estimation --dataset 'wikitext2' --all-layers llama-3.2-3b-instruct-INT4

Solution: 'wikitext2' must be wrapped in double quotes "wikitext2"

After about 30 minutes installation (Windows) it decides to check the versions and then fails. There is no OpenVINO 2024.5 so the commands provided will not work.

If you downgrade you then get the following: openvino-genai 2024.5.0.0.dev20241024 requires openvino_tokenizers~=2024.5.0.0.dev, but you have openvino-tokenizers 2024.4.1.0.dev20240926 which is incompatible.

Solution: pip install openvino==2024.4.0 openvino-tokenizers==2024.4.0.0 openvino-genai==2024.4.0

Then things go a little more smoothly, until you notice this little gem on the last line: Exporting tokenizers to OpenVINO is not supported for tokenizers version > 0.19 and openvino version <= 2024.4. Please downgrade to tokenizers version <= 0.19 to export tokenizers to OpenVINO.

So you think OK, no biggy, just downgrade:

Installing collected packages: tokenizers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.20.1
    Uninstalling tokenizers-0.20.1:
      Successfully uninstalled tokenizers-0.20.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.45.2 requires tokenizers<0.21,>=0.20, but you have tokenizers 0.19.0 which is incompatible.
Successfully installed tokenizers-0.19.0

After deleting the venv and starting again:

(ov-env) C:\LLMs\openvino.genai>pip list
Package             Version
------------------- ----------------------
numpy               2.1.2
openvino            2024.5.0.dev20241024
openvino-genai      2024.5.0.0.dev20241024
openvino-telemetry  2024.1.0
openvino-tokenizers 2024.5.0.0.dev20241024
packaging           24.1
pip                 24.2

All good, after optimum-cli command:

OpenVINO and OpenVINO Tokenizers versions are not binary compatible.
OpenVINO version:            2024.4.0-16579
OpenVINO Tokenizers version: 2024.5.0.0
First 3 numbers should be the same. Update OpenVINO Tokenizers to compatible version. It is recommended to use the same day builds for pre-release version. To install both OpenVINO and OpenVINO Tokenizers release version perform:
pip install --force-reinstall openvino openvino-tokenizers
To update both OpenVINO and OpenVINO Tokenizers to the latest pre-release version perform:
pip install --pre -U openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
Tokenizer won't be converted.

Try to get it back to 2024.5.0.dev20241024: pip install --force-reinstall openvino==2024.5.0.dev20241024

ERROR: Could not find a version that satisfies the requirement openvino==2024.5.0.dev20241024 (from versions: 2024.1.0, 2024.2.0, 2024.3.0, 2024.4.0, 2024.4.1.dev20240926)
ERROR: No matching distribution found for openvino==2024.5.0.dev20241024

Yet pip list:

openvino                  2024.5.0.dev20241024
openvino-genai            2024.5.0.0.dev20241024
openvino-telemetry        2024.1.0
openvino-tokenizers       2024.5.0.0.dev20241024

Oct 25 '24 01:10 AdamMiltonBarker

The documentation in the article has the following issues.

@AdamMiltonBarker could you please provide link on article which you follow?

Oct 25 '24 04:10 eaidova

@AdamMiltonBarker looks like you were following this one right? If so the command already has double quotes.

For the tokenizers issue to be compatible with 4.45 and higher you should use nightly versions of the components or downgrade transformers to avoid conflicts

Oct 25 '24 06:10 andrei-kochin

@AdamMiltonBarker looks like you were following this one right? If so the command already has double quotes.

For the tokenizers issue to be compatible with 4.45 and higher you should use nightly versions of the components or downgrade transformers to avoid conflicts

Hi Andrej. It did not last night. If you look in the comments it was changed to '' and then I replied to say it still needs "". That part of the comment was just to help you update the article.

The main issue is the installation is not working, in the comments there is someone else with the same issue also.

Oct 25 '24 11:10 AdamMiltonBarker

@AdamMiltonBarker looks like you were following this one right? If so the command already has double quotes.

For the tokenizers issue to be compatible with 4.45 and higher you should use nightly versions of the components or downgrade transformers to avoid conflicts

It does not work, you end up with the initial issue. There is another person in the comments on the article is facing the same. If the article is followed exactly you end up with the error provided above and in the article:

OpenVINO and OpenVINO Tokenizers versions are not binary compatible.
OpenVINO version: 2024.4.0-16579
OpenVINO Tokenizers version: 2024.5.0.0
First 3 numbers should be the same. Update OpenVINO Tokenizers to compatible version. It is recommended to use the same day builds for pre-release version. To install both OpenVINO and OpenVINO Tokenizers release version perform:
pip install --force-reinstall openvino openvino-tokenizers
To update both OpenVINO and OpenVINO Tokenizers to the latest pre-release version perform:
pip install --pre -U openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
Tokenizer won't be converted.
(openvino_env) C:\LLMs\openvino.genai>pip install --pre -U openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
Looking in indexes: https://pypi.org/simple, https://storage.openvinotoolkit.org/simple/wheels/nightly
Requirement already satisfied: openvino in c:\llms\openvino.genai\openvino_env\lib\site-packages (2024.5.0.dev20241024)
Requirement already satisfied: openvino-tokenizers in c:\llms\openvino.genai\openvino_env\lib\site-packages (2024.5.0.0.dev20241024)
Requirement already satisfied: numpy<2.2.0,>=1.16.6 in c:\llms\openvino.genai\openvino_env\lib\site-packages (from openvino) (2.1.2)
Requirement already satisfied: openvino-telemetry>=2023.2.1 in c:\llms\openvino.genai\openvino_env\lib\site-packages (from openvino) (2024.1.0)
Requirement already satisfied: packaging in c:\llms\openvino.genai\openvino_env\lib\site-packages (from openvino) (24.1)

pip install --force-reinstall openvino openvino-tokenizers also does not work.

Oct 25 '24 11:10 AdamMiltonBarker

@AdamMiltonBarker looks like you were following this one right? If so the command already has double quotes.

For the tokenizers issue to be compatible with 4.45 and higher you should use nightly versions of the components or downgrade transformers to avoid conflicts

Something has changed in the documentation since last night, The steps provided were followed 4 times prior to attempting the fixes I list above. Whatever has been updated in the article has fixed the issue.

Oct 25 '24 12:10 AdamMiltonBarker

I am opening again. Fresh installation of AI Dev Kit on Khadas AI PC (https://github.com/intel/aipc-devkit-install) Install GenAI per the README on this repo. Run the LLM Chatbot example:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
openvino-genai 2024.6.0.0 requires openvino_tokenizers~=2024.6.0.0.dev, but you have openvino-tokenizers 2025.0.0.0.dev20250110 which is incompatible.

Notebook does not fail and continues to this:

RuntimeError                              Traceback (most recent call last)
Cell In[14], line 4
      2 test_string = "2 + 2 ="
      3 input_tokens = tok(test_string, return_tensors="pt", **tokenizer_kwargs)
----> 4 answer = ov_model.generate(**input_tokens, max_new_tokens=2)
      5 print(tok.batch_decode(answer, skip_special_tokens=True)[0])

File [C:\Intel\venv\Lib\site-packages\torch\utils\_contextlib.py:116](file:///C:/Intel/venv/Lib/site-packages/torch/utils/_contextlib.py#line=115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File [C:\Intel\venv\Lib\site-packages\optimum\intel\openvino\modeling_decoder.py:726](file:///C:/Intel/venv/Lib/site-packages/optimum/intel/openvino/modeling_decoder.py#line=725), in OVModelForCausalLM.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
    724 if is_beam_search:
    725     self._first_iter_beam_search = True
--> 726 result = super().generate(
    727     inputs,
    728     generation_config,
    729     logits_processor,
    730     stopping_criteria,
    731     prefix_allowed_tokens_fn,
    732     synced_gpus,
    733     assistant_model,
    734     streamer,
    735     negative_prompt_ids,
    736     negative_prompt_attention_mask,
    737     **kwargs,
    738 )
    739 return result

File [C:\Intel\venv\Lib\site-packages\torch\utils\_contextlib.py:116](file:///C:/Intel/venv/Lib/site-packages/torch/utils/_contextlib.py#line=115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File [C:\Intel\venv\Lib\site-packages\transformers\generation\utils.py:2252](file:///C:/Intel/venv/Lib/site-packages/transformers/generation/utils.py#line=2251), in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   2244     input_ids, model_kwargs = self._expand_inputs_for_generation(
   2245         input_ids=input_ids,
   2246         expand_size=generation_config.num_return_sequences,
   2247         is_encoder_decoder=self.config.is_encoder_decoder,
   2248         **model_kwargs,
   2249     )
   2251     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> 2252     result = self._sample(
   2253         input_ids,
   2254         logits_processor=prepared_logits_processor,
   2255         stopping_criteria=prepared_stopping_criteria,
   2256         generation_config=generation_config,
   2257         synced_gpus=synced_gpus,
   2258         streamer=streamer,
   2259         **model_kwargs,
   2260     )
   2262 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   2263     # 11. prepare beam search scorer
   2264     beam_scorer = BeamSearchScorer(
   2265         batch_size=batch_size,
   2266         num_beams=generation_config.num_beams,
   (...)
   2271         max_length=generation_config.max_length,
   2272     )

File [C:\Intel\venv\Lib\site-packages\transformers\generation\utils.py:3297](file:///C:/Intel/venv/Lib/site-packages/transformers/generation/utils.py#line=3296), in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   3295     probs = nn.functional.softmax(next_token_scores, dim=-1)
   3296     # TODO (joao): this OP throws "skipping cudagraphs due to ['incompatible ops']", find solution
-> 3297     next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
   3298 else:
   3299     next_tokens = torch.argmax(next_token_scores, dim=-1)

File [C:\Intel\venv\Lib\site-packages\nncf\torch\dynamic_graph\wrappers.py:85](file:///C:/Intel/venv/Lib/site-packages/nncf/torch/dynamic_graph/wrappers.py#line=84), in wrap_operator.<locals>.wrapped(*args, **kwargs)
     83 ctx = get_current_context()
     84 if not ctx or getattr(ctx, "in_operator", False) or not ctx.is_tracing:
---> 85     op1 = operator(*args, **kwargs)
     86     return op1
     88 ctx.in_operator = True

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

I have been successful with creating a separate venv and just running the installation from this repository, but possibly some notes should be added to both repositories, saying that if you have used the ai pc installation it will fail.

Jan 12 '25 03:01 AdamMiltonBarker

.take

Mar 03 '25 08:03 thesakshidiggikar

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

Mar 03 '25 08:03 github-actions[bot]

If you are still interested in it, provide a link to the mentioned README. The current version doesn't describe genai installation https://github.com/intel/aipc-devkit-install/blob/a929af95431e8345f069619ab2852a590ec62fba/

Aug 04 '25 17:08 Wovchena