Diffusion_models_from_scratch icon indicating copy to clipboard operation
Diffusion_models_from_scratch copied to clipboard

Key error with file data/Imagenet64/metadata.pkl

Open opasquetdotfr opened this issue 2 years ago • 7 comments

Hello!

I am trying to make to train a model myself using Imagenet64x64 for a test; on a MAC using "mps" device.

It took me a little while to see that after downloading Imagenet64x64, I have to — use "loadImagenet64.py" to generate .pkl files in an "Imagenet64" folder. — THEN use "make_massive_tensor.py" to make a large .pt file. — THEN use "train.py" which will call "model_trainer.py"

Apparently (tell me if I am wrong): — "loadImagenet64.py" needs "Imagenet64_train_part1.zip" and "Imagenet64_train_part2.zip". Imagenet64x64 does not have these files. It rather has: train_data_batch_1, train_data_batch_2, train_data_batch_3... etc

— I changed the code in "loadImagenet64.py" to make a series of img and label .pkl files within the "Imagenet64" folder. — Then, when running "make_massive_tensor.py", I get the following errors:

Shape error with file data/Imagenet64/n.pkl
Key error with file data/Imagenet64/metadata.pkl

— I probably did something wrong in "loadImagenet64.py" with the formatting of pickles. But I do not know where that is happening. Dict's keys seem fine: loadImagenet64 seems to replace 'data', 'mean', 'labels' found in Imagenet64 with 'data', 'mean', 'labels'.

=> Where did you get "Imagenet64_train_part1.zip" OR how did you make them? => How to deal with the shape and dict keys within those .pkl ?

Thank you for your help!!!

O.

opasquetdotfr avatar Jul 20 '23 11:07 opasquetdotfr

After running "loadImagenet64.py", my .pkl files have:

  • for label: length 128116
  • for img: length 128116

"make_massive_tensor.py" cannot reshape them to (3, 64, 64) (<-- line 62 in "make_massive_tensor.py")

If that can help:

  • "label" type is list
  • "img" type is numpy.ndarray

opasquetdotfr avatar Jul 20 '23 11:07 opasquetdotfr

Apparently (tell me if I am wrong): — "loadImagenet64.py" needs "Imagenet64_train_part1.zip" and "Imagenet64_train_part2.zip". Imagenet64x64 does not have these files. It rather has: train_data_batch_1, train_data_batch_2, train_data_batch_3... etc

The loadImagenet64.py script loads in the data directly through the zip files downloaded from the ImageNet website. So, don't unzip the archives as this script works with zipped data. https://github.com/gmongaras/Diffusion_models_from_scratch/blob/main/data/loadImagenet64.py#L19

"img" type is numpy.ndarray Did you download the numpy version of ImageNet? I think the scripts work off the base version as opposed to the numpy version of ImageNet which may be causing the issue here.

Just a heads up: As for training on a mac, I haven't added mps support to the repo as in PyTorch 1.0, the MPS device had all types of weird issues. I'm not sure if that was fixed in PyTorch 2.0. For inference, this may be fine, but for training, I think you may run into multiple issues trying to get it to work properly.

Hope this helps! Let me know if you run into any more issues.

gmongaras avatar Jul 20 '23 15:07 gmongaras

Hello!

Thank you very much for your answer!

— For Imagenet64_train_part1.zip in your loadImagenet64.py, I have been waiting forever to get an authorization for Download from the ImageNet website. I am not sure they are still active. So I eventually downloaded Imagenet64 from Kaggle.com There are folders with the same name as your ZIP files. But zipping them does not seem to work either.

— I changed a bit of code to get MPS device to work ok on Apple Silicon. In fact, I used MPS device declaration within PyTorch on previous projects and that worked. I do not fully understand as I am not a specialist. But it seems to be working like a charm now with PyToch 2 on MacOS 12.3+.

— I will keep you posted with my tests and if I can make it work. I may ask for help from someone around me too.

Thank you again,

O.

opasquetdotfr avatar Jul 20 '23 16:07 opasquetdotfr

"One more thing": :)

Do you know the shape of the tensors in your .pkl files for "img" and "label"?

opasquetdotfr avatar Jul 20 '23 17:07 opasquetdotfr

oh that makes sense. So the Kaggle dataset is probably in a different format from the ImageNet dataset which is why you're running into issues loading in the data. I haven't added support for the Kaggle dataset, but I'd imagine the process should be the same. Perhaps the Kaggle dataset uses the numpy version?

Anyways, it'd be awesome if you could get the data working with Kaggle! Looking at the .pkl data, it looks like img is a flattened numpy array. For example, 0.pkl would have a flattened size of 12288. As for label, this is just an integer value assigning the class of this image.

{'img': array([ 34, 48, 80, ..., 194, 192, 188], dtype=uint8), 'label': 572}

gmongaras avatar Jul 20 '23 17:07 gmongaras

Kaggle dataset uses the numpy version

Yes it does. But that is not the problem.

it looks like img is a flattened numpy array

That makes sense here too! This is not what I have: img is 128116x12288 (no idea why) The problem is that just flattening it will not work because 128116 x 12288 = 1574289408 is not a multiple of (3 x 64 x 64). So I get the error: RuntimeError: shape '[3, 64, 64]' is invalid for input of size 1574289408

I have a closer look at it!

O.

opasquetdotfr avatar Jul 20 '23 18:07 opasquetdotfr

I think 128116x12288 is batch size by image size. So you have a batch size of 128116 and an image size of 12288 which is a multiple of 3x64x64.

gmongaras avatar Jul 21 '23 17:07 gmongaras