diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Trying to train images from a local folder fails

Open sam598 opened this issue 3 years ago • 2 comments

Describe the bug

Following this section of the diffuser's training example:

# Or just load images from a local folder!
config.dataset = "imagefolder"
dataset = load_dataset(config.dataset, data_dir="path/to/folder")

Fails with repeated errors while trying to load the dataset:

OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\Username\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\lib\cusolver64_11.dll" or one of its dependencies.

The custom imagefolder dataset is 2000 pngs of 128x128 pixels.

Loading and training the smithsonian_butterflies_subset works fine. It's when trying to use the imagefolder that things fail.

I've tried going into the advanced system settings and changed virtual memory to managed from automatic, but it made no difference.

Reproduction

No response

Logs

No response

System Info

Windows 11 Python 3.9.13

sam598 avatar Aug 14 '22 01:08 sam598

@sam598 - is there any way you could use Linux instead? I'm not sure that we can at this stage of the library already support everything in Windows sadly :-/ It's definitely on our long-term roadmap, but we probably won't have the time to test everything on Windows at the moment.

We'd be more than happy about some help from the community though in case you're interested :-)

What do you think @anton-l (or maybe it's an easy bug and we can solve it right away?)

patrickvonplaten avatar Aug 14 '22 12:08 patrickvonplaten

Everything else in the library seems to run fine. smithsonian_butterflies_subset downloads and caches without issue, and I am able to train and infer using the diffusers library.

Is there something different about the way the hosted datasets are loaded and prepared compared a local image folder? It appears to try and cache the local image in the same temporary location as the hosted dataset, but the folders are all empty.

Is there a way to locally convert an imagefolder to the same format as one of the example datasets without needing to upload it? Alternatively are there other methods for loading images that still work with the diffusers library?

Maybe this issue has more to do with the datasets library. The confusing thing is that while the errors appear when calling load_dataset, the error always relates to a different cuda/gpu dll. Is the library trying to load the images into virtual memory?

sam598 avatar Aug 14 '22 13:08 sam598

Hi! Could you please paste the entire error stack trace? This looks like a torch-related issue even though we don't use torch in the imagefolder loader.

mariosasko avatar Aug 16 '22 11:08 mariosasko

Closing due to inactivity

patrickvonplaten avatar Sep 13 '22 15:09 patrickvonplaten