openvino_notebooks Issue with running Flux.1.dev on iGPU

Describe the bug I'm trying to run the Flux.1.dev model using fp16 on the integrated GPU of an Intel® Core™ Ultra 7 165H. I attempted to generate an image with 50 inference steps, but one single step takes an extremely long time (forever) to complete. Additionally, I ran Stable Diffusion XL in fp16 under the same setup (50 steps, 1024×1024), and it completed in about 3 minutes per image, which is significantly faster than Flux.1.dev but still slow.

Expected behavior I want to verify whether my Intel integrated GPU (iGPU) is correctly activated and being used during inference. Could you guide me on how to check its status and ensure it's being utilized properly in my setup? Or is the performance I'm seeing simply expected for this hardware?

Screenshots The code i am using:

**Installation instructions I am not using the notebook. Models have all been converted usingoptimum-cli export openvino GPU and NPU can be detected:

** Environment information ** openai==1.77.0 opencv-python== 4.11.0.86 openvino==2025.1.0 openvino-genai==2025.1.0.0 optimum==1.25.0.dev0 optimum-intel==1.23.0.dev0+590692f

May 08 '25 01:05 stsxxx

Instead of a screenshot could you show the used code as text, please? That would make reproduction much easier. Is the code based on another sample or based on a Jupyter-Notebook?

Can you provide more details about your environment, OperatingSystem, version information?

May 08 '25 07:05 brmarkus

Thank you for your reply. The code is shown below. It's based on a simple diffusion model example from the OpenVINO repository. I'm using Ubuntu 24.04.2 LTS, Python 3.12.3, torch 2.7.0


from pathlib import Path
import argparse
import openvino_genai
from PIL import Image
from tqdm import tqdm
import sys
import openvino as ov

seed = 42
num_inference_steps = 50
random_generator = openvino_genai.TorchGenerator(seed)
pbar = tqdm(total=num_inference_steps)

def callback(step, num_steps, latent):
    if num_steps != pbar.total:
        pbar.reset(num_steps)
    pbar.update(1)
    sys.stdout.flush()
    return False

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('model_dir')
    parser.add_argument('prompt')
    args = parser.parse_args()

    # device = 'GPU'  # GPU can be used as well
    pipe = openvino_genai.Text2ImagePipeline(args.model_dir, device="GPU")
    print(pipe)

    result = pipe.generate(args.prompt, num_inference_steps=num_inference_steps, generator=random_generator, callback=callback, height=1024, width=1024)
    # pipe = ov_genai.Image2ImagePipeline(model_dir, device.value)
    pbar.close()

    final_image = Image.fromarray(result.data[0])
    final_image.save("output.png")

if '__main__' == __name__:
    main()

command: python3 test.py "model directory" "cyberpunk cityscape like Tokyo and New York with tall buildings at dusk, golden hour, cinematic lighting"

May 08 '25 07:05 stsxxx

I just tried to run the Jupyter notebook under https://github.com/openvinotoolkit/openvino_notebooks/blob/e5a8aa127c9464a356a6767d2fb62b88ed21be3c/notebooks/flux.1-image-generation/flux.1-image-generation.ipynb

but it fails for me with

!huggingface-cli download {ov_model_id} --local-dir {model_dir}

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/OpenVINO/FLUX.1-dev-int4-ov/revision/main Repository Not Found for url: https://huggingface.co/api/models/OpenVINO/FLUX.1-dev-int4-ov/revision/main.

(being logged into HuggingFace, accepted license agreement, access-token provided)

EDIT: According to this issue https://github.com/openvinotoolkit/openvino_notebooks/issues/2792 it was working in March this year.

@eaidova was the model recently moved, renamed, removed, do you know? Or do my HuggingFace credentials (I'm located in Europe/Germany) not allow to access the model? In HuggingFace, I see "Gated model You have been granted access to this model".

May 08 '25 08:05 brmarkus

@brmarkus dev model was never uploaed to huggingface hub under openvino account, it has license agreement limitations for that unfortunately, we have only schnell https://huggingface.co/OpenVINO/FLUX.1-schnell-int4-ov

May 08 '25 09:05 eaidova

I was using optimum-cli export openvino --model black-forest-labs/FLUX.1-dev --task text-to-image --weight-format fp16 ov_model_flux/ to convert huggingface model into OpenVINO IR format and load the model from ov_model_flux/ directory by calling pipe = openvino_genai.Text2ImagePipeline('ov_model_flux/', device="GPU").

May 08 '25 09:05 stsxxx

@brmarkus dev model was never uploaed to huggingface hub under openvino account, it has license agreement limitations for that unfortunately, we have only schnell https://huggingface.co/OpenVINO/FLUX.1-schnell-int4-ov

Ok, thank you - now I initiated download, conversion and compression using the Jupyter-Notebook for the model "black-forest-labs/FLUX.1-schnell". This is going to take a while.

Then I will try to reproduce inference using CPU and GPU from my MS-Win11-Pro, 64GB system memory, Intel Core Ultra 7 155H, using the query "cyberpunk cityscape like Tokyo and New York with tall buildings at dusk, golden hour, cinematic lighting".

May 08 '25 09:05 brmarkus

With the default parameters:

Pipeline settings
Input text: cyberpunk cityscape like Tokyo and New York with tall buildings at dusk, golden hour, cinematic lighting
Image size: 256 x 256
Seed: 42
Number of steps: 4

With the default checkbox "Use compressed models" activated.

Using the CPU Progress-bar: 100% 4/4 [01:44<00:00, 22.83s/it]

May 08 '25 09:05 brmarkus

Using the GPU, same parameters, same checkboxes, using "black-forest-labs/FLUX.1-schnell":

Progress-bar: 100% 4/4 [00:21<00:00, 2.98s/it]

Task-Manager showing GPU-utilization:

@stsxxx your code uses "num_inference_steps = 50", while the Juypter-Notebook uses "Number of steps: 4" only.

Is there a bigger difference between "FLUX.1-schnell" and "FLUX.1-dev"?

May 08 '25 09:05 brmarkus

I think the term "Non-Commercial Use Only" wouldn't allow me to use "FLUX.1-dev"...

@stsxxx do you see similar values in your environment when using "FLUX.1-schnell" instead, to compare and check regarding your initial question "how to check its status and ensure it's being utilized properly in my setup"?

May 08 '25 09:05 brmarkus

@brmarkus Thank you for your reply. The reason I’m using FLUX.1-dev in FP16 with 50 total steps is due to my experimental setup. I’ll give FLUX.1-schnell with INT4 a try, but it would be ideal if FLUX.1-dev is supported.

Also, if possible, could you try running Stable Diffusion XL in FP16 with image size 1024x1024? Since I was able to run it successfully, we could use it as a comparison to check whether my GPU is being utilized correctly.

It’s late night on my end, so I’ll run FLUX.1-schnell tomorrow. Thank you again for your help!

May 08 '25 09:05 stsxxx

@brmarkus Hi, i tried to run Flux.1-schnell according to the code provided here: https://huggingface.co/OpenVINO/FLUX.1-schnell-fp16-ov. I used only 4 steps, but the issue persists—it hasn't completed even after 30 minutes.

May 09 '25 21:05 stsxxx

Would you have a chance to run the Juypiter notebook

https://github.com/openvinotoolkit/openvino_notebooks/blob/e5a8aa127c9464a356a6767d2fb62b88ed21be3c/notebooks/flux.1-image-generation/flux.1-image-generation.ipynb ? The notebook uses a compressed and quantized variant in OpenVINO IR format. That could REALLY really make a difference!!

On my CPU (and MS-Win11-Pro, 64GB RAM), I got "4/4 [01:44<00:00, 22.83s/it]" - less than 2 minutes for 4 iterations. And on the GPU I got "4/4 [00:21<00:00, 2.98s/it]" - less than 30 seconds for 4 iterations.

May 09 '25 21:05 brmarkus

I'll give it a try. Does this mean that other model formats aren't supported here? Even in the provided notebook, it mentions that you can use FLUX.1-dev by simply switching. I'm using the weights from your model hub, but it's not working—so I believe there may be another issue at play.

May 09 '25 21:05 stsxxx

OpenVINO supports different formats (like ONNX and others), but there is also an optimized format called IntermediateRepresentation "IR". Plus, in this case, INT4 is used, but also INT8 or FP16 or FP32 (or BF16) could be used. In addition the modell got compressed. You could use tools like "model_analyzer" to compare the different variants. Depending on the underlying hardware, specific (CPU-)instructions are used. In the IR format, there are two files, a XML- and a BIN-file. Please replace both when switching models.

May 10 '25 06:05 brmarkus