Diffusion-Probabilistic-Models
Diffusion-Probabilistic-Models copied to clipboard
How does `beta_arr` persist its computation graph across iterations?
Hi, first of all thank you for making the code public. I am working on implementing it in PyTorch and have a question about the current Theano implementation.
The method generate_beta_arr
is only called once during initialization here. As far as I understand, the method defines parameters beta_perturb_coefficients
which are then learnt during model training. Since these parameters are only used once (during initialization) to define beta_arr
, the computation graph is only created once. The second time this model is run, the computation graph will not contain any information about beta_perturb_coefficients
. Then how are these values learnt?
Eg I cannot do this in PyTorch without explicitly mentioning that I want to retrain the computation graph:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.beta_arr = nn.Parameter(torch.randn(5))
self.some_variable = self.beta_arr * self.beta_arr
def forward(self, x):
return (self.some_variable * x)
Hi! I'm also working on implementing it, but in TensorFlow 2.x. I was interested in your question - is these coefficients learned or not? As I wrote some start code some days ago, I thought there that these coefficients are like constant variables, and only assign in the start to initialize beta_arr
(which is - also constant variable by the way). In the past I wrote a lot of the code in TensorFlow 1.x and a little bit Theano, so I dig deeper to see what happening.
As an answer to your question - all variables that created in the method generate_beta_arr
behave as constant variables and do not learn during training.
As a proof, I'm able to run code in Colab, where I create the model and create a graph using Blocks library. In the part (in train.py
file):
step_compute = RMSProp(learning_rate=1e-3, max_scaling=1e10)
algorithm = GradientDescent(step_rule=CompositeRule([RemoveNotFinite(),
step_compute]),
parameters=cg.parameters, cost=cost)
You can see that optimized during training only variables from cg.parameters
. In colab these variables are:
I think that its variables from Convolutional and MLP layers, because
beta_perturb_coefficients
(and other in the method) has its own name which we assign when create variables via theano.shared
. So, if we want this variable to be learned - we should put in into parameters
list of the class GradientDescent
.
Also, there are TODO
note above beta_perturb_coefficients
, I think authors didn't have time to update the code.
To be 100% sure, I run a training loop several times, and output are next:
In the start:
After some iteration of the training via:
I got:
So, parameters do change, but beta coeffs does not.
Hope it helps!