glow-realnvp-tutorial
glow-realnvp-tutorial copied to clipboard
Raise NaN when using model to fit `celebA` dataset
The initial loss / log_prob is very big like 1e24. I tried CUB200
as well, but CUB200
has around 40000
initial loss and training goes well.
When I try to use your model in folder realworld
to fit celebA
, it failed at the very beginning. Raised NaN.
Both CUB200
and celebA
resized to same size (224, 224, 3), but why it failed on celebA
?
BTW, this tutorial is very good!
I show you these difference with glow. and glow's problem , tfp's problem.
-
in glow, his gaussianize for factor-out's latent has an issue. this is a reason for raising NaN…(but I don't have no proof)
In realnvp, flow++, my implement zi,hi = factor-out(hi-1) zi 〜 N(0,1) In glow, zi 〜 N(mu, sigma) mu, sigma = convnet(hi)
-
we need normalisation for weight because this network is too sensitive (glow has the same problem)
-
tfp's logdet jacobian has shape [] but in affine coupling, logdet jacobian has the shape [batch-size] this may show tfp has a critical problem about loss formula.
I think weight normalisation is good for preventing NaN in some experiments. But I know no papers about this.
(by the way, I'm writting another tensorflow's normalization in TFGENZOO. )
ref https://github.com/tensorflow/probability/issues/576
I show you these difference with glow. and glow's problem , tfp's problem.
- in glow, his gaussianize for factor-out's latent has an issue. this is a reason for raising NaN…(but I don't have no proof) In realnvp, flow++, my implement zi,hi = factor-out(hi-1) zi 〜 N(0,1) In glow, zi 〜 N(mu, sigma) mu, sigma = convnet(hi)
- we need normalisation for weight because this network is too sensitive (glow has the same problem)
- tfp's logdet jacobian has shape [] but in affine coupling, logdet jacobian has the shape [batch-size] this may show tfp has a critical problem about loss formula.
By his gaussianize for factor-out's latent , I don't quite get it. What is hi? zi 〜 N(mu, sigma)
you mean the latent variable's distribution( Gaussian distribution) has learnable parameters mu
and sigma
which learn from convnet? Or you are saying the affine coupling layer?
By tfp problem, you are saying logdet will be broadcast to a [batch-size, ] instead of a whole batch-size as one ?
I think normalization for weight is a good idea😊, I will try!
-
mu and sigma is trainable variable in training in Glow. Not an affine coupling layer. ref. 1. https://github.com/openai/glow/blob/master/model.py#L89 (this is splitting) 2. https://github.com/openai/glow/blob/master/model.py#L89 (it's definition) 3. https://github.com/openai/glow/blob/master/model.py#L576-L584 (get mu and sigma from z1) 4. https://github.com/openai/glow/blob/master/model.py#L552 (gaussianize z2 by mu and sigma)
-
Yes, I think
Thx Mokke. I totally understand your points.
I found something kinda interesting in Glow source code, which is they were not using affine coupling layer instead they used additive coupling. I think it might have something to do with Actnorm. I noticed you used Affine coupling in your implementing of Glow. Will that be a conflict between Actnorm and Affine coupling which might cause raise NaN
?
I think No. Surely, AffineCoupling Layer has multiply operation. (Adductive Coupling doesn't have it) But, multiply operation uses scaled value. (https://github.com/MokkeMeguru/glow-realnvp-tutorial/blob/master/examples/models/affineCoupling.py#L55-L63)
(So, I think "SCALED" is very good method avoiding NaN. Ex. I recommend use weight normalization. And someone says, "we should use normalization for inv1x1conv" https://github.com/openai/glow/issues/40#issuecomment-462103120)
Thx again for explaining all these!
There is one more thing I found in your code recently. For the sake of convenience, I post it here, hope you don't mind. 😅
I found something confusing. In your implementing of GLOW , affineCoupling layer, the jacobian seems not quite right.
https://github.com/MokkeMeguru/glow-realnvp-tutorial/blob/10461d7a0db9fb59e8b630668d2409ec7dcd43fa/realworld/layers/affineCoupling.py#L156
The jacobian is sum(log|s|)
in original paper, as far as I understand, shouldn't it be tf.reduce_sum(tf.math(tf.math.abs(log_s)))
?
At first, |logs| and log|s| is not same. In the paper, the reason why they use log|s| instead of logs is log x (x <= 0) is not defined.
So they says, "s"'s domain is mathbb{R} not mathbb{R}^{+}. But you can see the Function in affine coupling layer in the paper,
y = exp(log s) x + t
This shows "s"'s domain is mathhbb{R}^{+}. So, we can use log_s instead of log |s|.
Q. why I calculate log s instead of s in NN layer? A. If s is 0.0000000...1, log s will be NaN.
how to save the model in realNVP? I try to use flow.save(), but raise object has no attribute 'save'.
I think you did save tf.keras.layers.Layer. please wrap your layer with tf.keras.layers.Model. ref. https://github.com/MokkeMeguru/glow-realnvp-tutorial/blob/master/realworld/model.py#L116
@MokkeMeguru Hey, this issue make me curious how does tfp version of glow generating celeba look like... so I tinker your code a bit, adding variational dequantization and making neural nets a bit more complex and train it on celeba, the results from first few epochs look so bad..... I use transformedDistribution.sample()
the image directly.
Do you think that the result above is correct?
The NaN problem is caused by the data preprocess... Data should be correctly logitsify( term might be wrong) and adding dequantization, just like you did in TFGENZOO
.
Your works in TFGENZOO
is fantasty! Due to some dependencies issues I stick with tfp
tho. But anyway, there merely nothing open source codes about flow-based model in tfp
, not even in tf2
. 😆
So, can you help me out please? Do you think that the model is correct based on that sampled image? What might cause it in your opinion? Thx for your works again!!!!