cog Flux.1-dev on 24GB VRAM OOM

Flux.1-dev on 24GB VRAM OOM

Open CarstenHoyer opened this issue 4 months ago • 0 comments

I have this predict function:

def predict(self) -> Any:
        """Run a single prediction on the model"""
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        vram = int(torch.cuda.get_device_properties(0).total_memory/(1024*1024*1024))
        print("VRAM", vram)
        
        pipe = FluxPipeline.from_pretrained(flux_path, torch_dtype=torch.bfloat16).to(device)
        pipe.enable_model_cpu_offload()

        prompt = "A cat holding a sign that says hello world"
        image = pipe(
            prompt,
            height=1024,
            width=1024,
            guidance_scale=3.5,
            num_inference_steps=50,
            max_sequence_length=512,
            generator=torch.Generator("cpu").manual_seed(0)
        ).images[0]
        image.save("flux-dev.png")
        return "flux-dev.png"

I have 24GB VRAM (the vram variable report 23) on a NVIDIA GeForce RTX 4090.

But when I run sudo cog predict --setup-timeout 3600 I get an Out of Memory error. But flux should be able to run 22GB. I wonder if it is something related to cog/wsl/docker?

Sep 25 '24 12:09 CarstenHoyer

cog cog copied to clipboard

Flux.1-dev on 24GB VRAM OOM

cog
cog copied to clipboard