model_server Python demos requirement incompatibility

Describe the bug There's still one more issue caused by the transformers upgrade aimed at the 2025.1 release. If you run a test program that is designed to confirm compatibility between the transformers library and the Intel-optimized optimum.intel.openvino you get a traceback:

Traceback (most recent call last):
File "//./smoke-2.py", line 34, in
output_ids = model.generate(input_ids, attention_mask=attention_mask, max_length=40)
File "/usr/local/lib64/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/transformers/generation/utils.py", line 2092, in generate
self._prepare_cache_for_generation(
File "/usr/local/lib/python3.9/site-packages/transformers/generation/utils.py", line 1714, in _prepare_cache_for_generation
if not self._supports_default_dynamic_cache():
File "/usr/local/lib/python3.9/site-packages/transformers/generation/utils.py", line 1665, in _supports_default_dynamic_cache
self._supports_cache_class
AttributeError: 'OVModelForCausalLM' object has no attribute '_supports_cache_class'

The _supports_cache_class attribute was introduced recently (transformers 4.42.x), and the Optimum-Intel (OVModelForCausalLM) class hasn't implemented support for the latest caching API introduced by transformers. Upstream noticed this and added support in the optimum 1.18.1 release.

So, the requirements should be optimum[diffusers]==1.18.1. Would upgrading optimum cause any other problems?

To Reproduce Run the following program in the image after installing the demos/python_demos/requirements.txt python modules.

from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoTokenizer

# Model name compatible with OpenVINO optimizations
model_name = "gpt2"

# Load tokenizer (Transformers API)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# Load optimized model (Optimum Intel API with OpenVINO backend)
model = OVModelForCausalLM.from_pretrained(model_name, export=True)

# Prepare input text
prompt = "Testing transformers and optimum.intel integration"
inputs = tokenizer(prompt, return_tensors="pt", padding=True)
input_ids = inputs.input_ids
attention_mask = inputs.attention_mask

# Generate output (testing both transformers tokenization & OpenVINO inference)
output_ids = model.generate(input_ids, attention_mask=attention_mask, max_length=40)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Prompt:", prompt)
print("Generated text:", generated_text)

Expected behavior The program should output something. And it does with optimum==1.18.1.

Configuration OVMS 2025.1

Apr 07 '25 19:04 RH-steve-grubb

Fixed here: https://github.com/openvinotoolkit/model_server/pull/3211

Apr 08 '25 15:04 rasapala

Yes, just looked. But the transformers line now allows installing versions that have the CVEs (and the CVEs are rated high: most are 8.8 cvss). The CVE's were fixed in 4.48.0. I'll give this a try and see if it selects the newest or picks something that might have the CVE.

Apr 08 '25 15:04 RH-steve-grubb

OK, ran the installation. It now downloads transformers from 4.49 all the way back to transformers-4.26.0 while deciding which to pick. Pip then ultimately selected transformers-4.39.3. Previously, the requirements were set to 4.40.0. So, this looks like a regression from that perspective. I'd suggest making it transformers>=4.48.0,<=4.49 so that it's past all CVEs.

Apr 08 '25 17:04 RH-steve-grubb

OK, testing my last suggestion leads to a big conflict. Pip is saying this:

The conflict is caused by: The user requested transformers==4.48.0 optimum 1.18.1 depends on transformers<4.40.0 and >=4.26.0

Looking at optimum, it's dependencies are kind of messed up. The 1.18.1 release notes even says: "Enable transformers v4.42.0 " https://github.com/huggingface/optimum-intel/releases/tag/v1.18.1

But they did not update setup.py to reflect this. They have everything right in the v1.22 setup.py. However, the package that comes from pipy says optimum 1.22.0 depends on transformers<4.45.0 and >=4.29. I don't know how that can be.

It is possible to fix the original problem (1.17.0) with sed:

sed -i 's/self._supports_cache_class/False/'
/usr/local/lib/python3.9/site-packages/transformers/generation/utils.py

Seems like upstream optimum needs to sort out it's dependencies.

Apr 08 '25 19:04 RH-steve-grubb

@RH-steve-grubb we will enforce using newer version of the transformers without the vulnerability. With that we will drop also old python node demo with seq2seq use case. It is based on old optimum-intel fork which is not in sync with latest transformers. Right now LLM models are supported with OpenAI API so that old demo is now obsolete. The remaining python demos will be with stable diffusion and clip. Stable diffusion will be replace in new release with image generation endpoint in next release 2025.2.

Apr 09 '25 14:04 dtrawins