diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Colab Fails to run half the time on a V100

Open shadowlocked opened this issue 1 year ago • 1 comments

Describe the bug

Despite having credits and Colab Pro (not Pro plus), weeks at a time pass before Colab will let me back into the A100 space again.

The Colab is now too bloated to reliably run on the V100s I get instead. The failure point is reg image caching. For some reason, depending on the V100 instance I get (even the ones with 25GB VRAM), the Colab stalls, 65% of the time, at the point of loading more than about a thousand reg images (usually in the 1200+ range).

At this point, I can see the RAM maxed out, and the estimate for caching goes from 10 minutes to 5-10 hours.

I think the original 2022/early 2023 script was really super-optimized to use the last dregs of system resources on free Colab tiers, and that some recent changes have pushed these bare margins over the edge.

If there is any ballast left in the essential resource loading, it would be great if it could be shed in the Colab. A100s are getting harder and harder to obtain in Colab, presumably for anything less than Colab Pro+++ or whatever they call the most expensive tier (I am only subscribed to Colab Pro, middle tier).

Reproduction

Not applicable

Logs

No response

System Info

Colab current

shadowlocked avatar Aug 03 '23 14:08 shadowlocked

same bug in my colab pro subscription

whitemankpi avatar Oct 13 '23 10:10 whitemankpi