URAE
URAE copied to clipboard
Sequential cpu offload and VAE tiling to less VRAM requirement
I haven't been able to test this code myself but will try it at some point but this should make it more efficient in terms of VRAM usage. sequential cpu offload takes 1024x1024 down to around 1GB VRAM usage so should allow this to run on consumer GPU's but will take longer.
import torch
from diffusers import FlowMatchEulerDiscreteScheduler
from pipeline_flux import FluxPipeline
from transformer_flux import FluxTransformer2DModel
bfl_repo = "black-forest-labs/FLUX.1-dev"
scheduler_config = FlowMatchEulerDiscreteScheduler.load_config(bfl_repo, subfolder="scheduler")
scheduler_config.use_dynamic_shifting = False
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
transformer = FluxTransformer2DModel.from_pretrained(bfl_repo, subfolder="transformer", torch_dtype=torch.bfloat16)
pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=transformer, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
pipe.enable_sequential_cpu_offload()
pipe.enable_tiling()
pipe.load_lora_weights("Huage001/URAE", weight_name="urae_2k_adapter.safetensors")
prompt = "An astronaut riding a green horse"
image = pipe(
prompt,
height=2048,
width=2048,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-urae.png")
Also might be good to document the other options
proportional_attention=True,
ntk_factor=10.0,