How to run qwen-image in kaggle gpu T4 * 2 successfully?
!python3 -m pip install -U diffusers peft bitsandbytes
import diffusers, torch, math
qwen = diffusers.QwenImagePipeline.from_pretrained('Qwen/Qwen-Image', torch_dtype=torch.float16, low_cpu_mem_usage=True, quantization_config=diffusers.PipelineQuantizationConfig(quant_backend='bitsandbytes_4bit', quant_kwargs={'load_in_4bit':True, 'bnb_4bit_quant_type':'nf4', 'bnb_4bit_compute_dtype':torch.float16}, components_to_quantize=['transformer', 'text_encoder']))
qwen.scheduler = diffusers.FlowMatchEulerDiscreteScheduler.from_config({'base_image_seq_len':256, 'base_shift':math.log(3), 'invert_sigmas':False, 'max_image_seq_len':8192, 'max_shift':math.log(3), 'num_train_timesteps':1000, 'shift':1, 'shift_terminal':None, 'stochastic_sampling':False, 'time_shift_type':'exponential', 'use_beta_sigmas':False, 'use_dynamic_shifting':True, 'use_exponential_sigmas':False, 'use_karras_sigmas':False})
qwen.load_lora_weights('lightx2v/Qwen-Image-Lightning', weight_name='Qwen-Image-Lightning-4steps-V2.0.safetensors', adapter_name='lightning')
qwen.set_adapters('lightning', adapter_weights=1)
qwen.enable_sequential_cpu_offload()
qwen(prompt='a beautiful girl', height=1280, width=720, num_inference_steps=4, true_cfg_scale=1).images[0].save('a.png')
----> 3 qwen = diffusers.QwenImagePipeline.from_pretrained('Qwen/Qwen-Image', torch_dtype=torch.float16, low_cpu_mem_usage=True, quantization_config=diffusers.PipelineQuantizationConfig(quant_backend='bitsandbytes_4bit', quant_kwargs={'load_in_4bit':True, 'bnb_4bit_quant_type':'nf4', 'bnb_4bit_compute_dtype':torch.float16}, components_to_quantize=['transformer', 'text_encoder']))
OutOfMemoryError: CUDA out of memory. Tried to allocate 34.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 4.19 MiB is free. Process 8568 has 14.73 GiB memory in use. Of the allocated memory 14.50 GiB is allocated by PyTorch, and 129.00 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
How to get more cuda memory?
@yiyixuxu @DN6
did you try
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
at the beginning of the notebook then
import torch
Also did you consider
import gc
gc.collect()
or
import torch
torch.cuda.empty_cache()
before you run the vram heavy cell