DenseFlow icon indicating copy to clipboard operation
DenseFlow copied to clipboard

Clarification of the BPD results on ImageNet32/ImageNet64

Open zhengkw18 opened this issue 1 year ago • 2 comments

Congratulations on your good work! I think DenseFlow is the SOTA among normalizing flows, but I would like to make some clarifications regarding its comparison with other methods (such as diffusion models).

I was comparing DenseFlow against VDM on ImageNet64x64.

DenseFlow: 3.35 BPD, 130M, 1 V100 ~2 weeks VDM: 3.4 BPD, ?M, 128 TPUv3 for ?weeks?

It looks like DenseFlow gets better BPD with ~100x less compute,

I think the reason why DenseFlow has such a good BPD on ImageNet32/ImageNet64 with distinctly lower computational cost is that the wrong version of downsampled ImageNet was used. I have recently uploaded the code of our ICML2023 paper Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs (https://github.com/thu-ml/i-DODE), where this question is emphasized as:

There are two different versions of ImageNet32 dataset. For fair comparisons, we use both versions of ImageNet32, one is downloaded from https://image-net.org/data/downsample/Imagenet32_train.zip, following Flow Matching [3], and the other is downloaded from http://image-net.org/small/train_32x32.tar (old version, no longer available), following ScoreSDE and VDM. The former dataset applies anti-aliasing and is easier for maximum likelihood training.

Clearly, DenseFlow chose the new version of ImageNet32/64 (https://github.com/matejgrcic/DenseFlow/blob/473220a9c02b262b481fbaa50a947e40bad3f99c/denseflow/data/datasets/image/imagenet32.py), which is in favor of the BPD. Therefore, I suggest the author clarify this and remove the BPD result from the rank list (https://paperswithcode.com/paper/densely-connected-normalizing-flows), where other methods are using the old version ImageNet and the comparison is unfair and confusing.

zhengkw18 avatar Nov 29 '23 06:11 zhengkw18

We conducted experiments on both versions of ImageNet32, and found that the new version typically results in about 0.3 lower BPD than the old version: 3.43 (new version, batch size 128, A40 GPU) vs. 3.69 (old version, batch size 512, A100 GPU). So the dataset difference is rather notable.

It seems that Efficient-VDVAE on https://paperswithcode.com/sota/image-generation-on-imagenet-64x64 also uses the wrong version of ImageNet and leads to unfair comparison.

Under fair comparison, VDM is still the current SOTA likelihood model on CIFAR10/ImageNet32/ImageNet64.

zhengkw18 avatar Nov 29 '23 07:11 zhengkw18

Hi, thanks for pointing out the mismatch between the two versions of IN32. As far as I know, this is mostly unknown in the community and the old version being unavailable doesn't help. I will update the README so that it is more clear that we trained on the new version of IN32. Cheers!

matejgrcic avatar Dec 17 '23 20:12 matejgrcic