Huadong Liao
Huadong Liao
You're righ, bijection means unchanged dimension. It's squeezing and splitting operations in multi-scale arch change the dimension of x. But you misunderstood the number of channels in table.1 (512,128 etc.)....
We have no plan for a pytorch version, as I'm busy on DLF v2.0, aiming to improve the multi-scale archtecture. Maybe will release DLF v2.0 with a pytorch version.
Thanks @BeautyGlow for pointing out this. We are now running an additional experiment to make this comparison more rigorous. Because affine coupling layer is a case of our method when...
Even when K=2, our dynamic linear transformation is different from affine coupling layer, discussed in Section 3.1. We found K=4, 6 for inverse dynamic linear transformation is also worse than...
Yeah, it turns out our best results are obtained by changing y1 = x1 in affine coupling layer to y1 = s1*x1 + u1 (Actnorm layer likes). This is reasonable....
@lukemelas The changes in our best case (K=2) compared to Glow can be concluded as three points: 1. in the affine coupling layer, we choose h(x_1) = s_1*x_1+u_1 instead of...
In the top prior layer, the mean and logs are shared in spacial dimension in case of non-conditioning (ycond=False), meaning (mean, logs) = tensor(1, 1, 1, 2*n). In the implementation,...
It's for training stability, see the experiments section in our paper.
> Could you double check that your small imagenet datasets are the same as http://image-net.org/small/train_32x32.tar, and http://image-net.org/small/valid_32x32.tar? Yes, it's from there. I used the [scripts from Glow repo](https://github.com/openai/glow/tree/master/data_loaders/generate_tfr) to generate...
> different preprocessing on imagenet can greatly affect the likelihood you get Confirmed, you're right. I tested it on CelebA 256x256 and ImageNet 32x32 with this repo, by downsampling them...