clean-fid icon indicating copy to clipboard operation
clean-fid copied to clipboard

ImageNet-1k statistics?

Open mehdidc opened this issue 2 years ago • 5 comments

Hello, thanks for the great work and the package. Are there any plans to release ImageNet-1k statistics? if not, I can try to do it, and provide the steps to reproduce.

mehdidc avatar Jul 20 '21 06:07 mehdidc

Hi,

Thanks for the suggestion! If you can provide me with the details of the dataset and steps to produce them, I can add these statistics too.

Regards, Gaurav

GaParmar avatar Jul 20 '21 16:07 GaParmar

Hey, I could do it successfully using clean and legacy_pytorch modes. I just needed to add the extension 'JPEG' to https://github.com/GaParmar/clean-fid/blob/main/cleanfid/utils.py#L50 because ImageNet-1k images have that extension. I think it would be nice to make the extensions parametrizable, I could do a PR for that.

Here is the link for the stats (training and validation stats for both clean and legacy_pytorch modes): https://drive.google.com/drive/folders/1q7b-hqc-xUUGi9fGzfI1gVlJYk2Jji5h?usp=sharing

For legacy_tensorflow, it did not work, it raised an exception: return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 250, 250] at entry 0 and [3, 150, 200] at entry 1

Here are the steps to reproduce so that you can compare with the above stats if you would like:

  1. Download ILSVRC2012_img_train.tar and ILSVRC2012_img_valid.tar from https://image-net.org/download.php
  2. Extract training: tar xvf ILSVRC2012_img_train.tar -C train, which itself contains tars. Inside train: for v in *.tar;do tar xvf $v;done
  3. Extract validation: tar xvf ILSVRC2012_img_train.tar -C valid
  4. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="clean", num_workers=8, batch_size=128)'
  5. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="legacy_pytorch", num_workers=8, batch_size=128)'
  6. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="clean", num_workers=8, batch_size=128)'
  7. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="legacy_pytorch", num_workers=8, batch_size=128)'

For training there are 1281167 images, and for valid 50000 images.

software stack:

torch==1.8.1+cu111
torchvision==0.9.1+cu111
numpy==1.19.0
scipy==1.6.3
pillow=8.2.0
requests==2.25.1
clean-fid==0.1.13

Also:

CUDA: 11.1.1
cuDNN: 8.0.4.30

mehdidc avatar Jul 20 '21 19:07 mehdidc

Thanks for providing the details. I will take a look at the error with the "legacy_tensorflow" mode. I will verify/test the statistics with some pretrained models and get back to you when I upload them.

GaParmar avatar Jul 20 '21 19:07 GaParmar

Hi,

I would be more careful with the steps required in processing the ImageNet images. In the steps you have followed, you are resizing all ImageNet images without applying any crop. This might not be the commonly followed setting. See Section A.1 in this paper for some details (https://arxiv.org/pdf/2006.10738.pdf)

Regards, Gaurav

GaParmar avatar Sep 06 '21 16:09 GaParmar

Hey, I could do it successfully using clean and legacy_pytorch modes. I just needed to add the extension 'JPEG' to https://github.com/GaParmar/clean-fid/blob/main/cleanfid/utils.py#L50 because ImageNet-1k images have that extension. I think it would be nice to make the extensions parametrizable, I could do a PR for that.

Here is the link for the stats (training and validation stats for both clean and legacy_pytorch modes): https://drive.google.com/drive/folders/1q7b-hqc-xUUGi9fGzfI1gVlJYk2Jji5h?usp=sharing

For legacy_tensorflow, it did not work, it raised an exception: return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 250, 250] at entry 0 and [3, 150, 200] at entry 1

Here are the steps to reproduce so that you can compare with the above stats if you would like:

  1. Download ILSVRC2012_img_train.tar and ILSVRC2012_img_valid.tar from https://image-net.org/download.php
  2. Extract training: tar xvf ILSVRC2012_img_train.tar -C train, which itself contains tars. Inside train: for v in *.tar;do tar xvf $v;done
  3. Extract validation: tar xvf ILSVRC2012_img_train.tar -C valid
  4. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="clean", num_workers=8, batch_size=128)'
  5. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="legacy_pytorch", num_workers=8, batch_size=128)'
  6. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="clean", num_workers=8, batch_size=128)'
  7. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="legacy_pytorch", num_workers=8, batch_size=128)'

For training there are 1281167 images, and for valid 50000 images.

software stack:

torch==1.8.1+cu111
torchvision==0.9.1+cu111
numpy==1.19.0
scipy==1.6.3
pillow=8.2.0
requests==2.25.1
clean-fid==0.1.13

Also:

CUDA: 11.1.1
cuDNN: 8.0.4.30

Hi Mehdi, do you remember what image size did you use to rescale? 256x256 or 512x512? The reason why I ask this is because of this.

machengcheng2016 avatar Feb 26 '24 11:02 machengcheng2016