How to run qwen-image in kaggle gpu T4 * 2 successfully?

Open chaowenguo opened this issue 1 month ago • 1 comments

!python3 -m pip install -U diffusers peft bitsandbytes
import diffusers, torch, math
qwen = diffusers.QwenImagePipeline.from_pretrained('Qwen/Qwen-Image', torch_dtype=torch.float16, low_cpu_mem_usage=True, quantization_config=diffusers.PipelineQuantizationConfig(quant_backend='bitsandbytes_4bit', quant_kwargs={'load_in_4bit':True, 'bnb_4bit_quant_type':'nf4', 'bnb_4bit_compute_dtype':torch.float16}, components_to_quantize=['transformer', 'text_encoder']))
qwen.scheduler = diffusers.FlowMatchEulerDiscreteScheduler.from_config({'base_image_seq_len':256, 'base_shift':math.log(3), 'invert_sigmas':False, 'max_image_seq_len':8192, 'max_shift':math.log(3), 'num_train_timesteps':1000, 'shift':1, 'shift_terminal':None, 'stochastic_sampling':False, 'time_shift_type':'exponential', 'use_beta_sigmas':False, 'use_dynamic_shifting':True, 'use_exponential_sigmas':False, 'use_karras_sigmas':False})
qwen.load_lora_weights('lightx2v/Qwen-Image-Lightning', weight_name='Qwen-Image-Lightning-4steps-V2.0.safetensors', adapter_name='lightning')
qwen.set_adapters('lightning', adapter_weights=1)
qwen.enable_sequential_cpu_offload()
qwen(prompt='a beautiful girl', height=1280, width=720, num_inference_steps=4, true_cfg_scale=1).images[0].save('a.png')

----> 3 qwen = diffusers.QwenImagePipeline.from_pretrained('Qwen/Qwen-Image', torch_dtype=torch.float16, low_cpu_mem_usage=True, quantization_config=diffusers.PipelineQuantizationConfig(quant_backend='bitsandbytes_4bit', quant_kwargs={'load_in_4bit':True, 'bnb_4bit_quant_type':'nf4', 'bnb_4bit_compute_dtype':torch.float16}, components_to_quantize=['transformer', 'text_encoder']))

OutOfMemoryError: CUDA out of memory. Tried to allocate 34.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 4.19 MiB is free. Process 8568 has 14.73 GiB memory in use. Of the allocated memory 14.50 GiB is allocated by PyTorch, and 129.00 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

How to get more cuda memory?

@yiyixuxu @DN6

Nov 26 '25 12:11 chaowenguo

did you try

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

at the beginning of the notebook then

import torch

Also did you consider

import gc
gc.collect()

import torch
torch.cuda.empty_cache()

before you run the vram heavy cell

Nov 28 '25 03:11 akshan-main