Diffusion-Probabilistic-Models icon indicating copy to clipboard operation
Diffusion-Probabilistic-Models copied to clipboard

How does `beta_arr` persist its computation graph across iterations?

Open vinsis opened this issue 2 years ago • 1 comments

Hi, first of all thank you for making the code public. I am working on implementing it in PyTorch and have a question about the current Theano implementation.

The method generate_beta_arr is only called once during initialization here. As far as I understand, the method defines parameters beta_perturb_coefficients which are then learnt during model training. Since these parameters are only used once (during initialization) to define beta_arr, the computation graph is only created once. The second time this model is run, the computation graph will not contain any information about beta_perturb_coefficients. Then how are these values learnt?

Eg I cannot do this in PyTorch without explicitly mentioning that I want to retrain the computation graph:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.beta_arr = nn.Parameter(torch.randn(5))
        self.some_variable = self.beta_arr * self.beta_arr
        
    def forward(self, x):
        return (self.some_variable * x)

vinsis avatar Oct 17 '22 18:10 vinsis

Hi! I'm also working on implementing it, but in TensorFlow 2.x. I was interested in your question - is these coefficients learned or not? As I wrote some start code some days ago, I thought there that these coefficients are like constant variables, and only assign in the start to initialize beta_arr (which is - also constant variable by the way). In the past I wrote a lot of the code in TensorFlow 1.x and a little bit Theano, so I dig deeper to see what happening.

As an answer to your question - all variables that created in the method generate_beta_arr behave as constant variables and do not learn during training.

As a proof, I'm able to run code in Colab, where I create the model and create a graph using Blocks library. In the part (in train.py file):

step_compute = RMSProp(learning_rate=1e-3, max_scaling=1e10)
algorithm = GradientDescent(step_rule=CompositeRule([RemoveNotFinite(),
    step_compute]),
    parameters=cg.parameters, cost=cost)

You can see that optimized during training only variables from cg.parameters. In colab these variables are: image I think that its variables from Convolutional and MLP layers, because beta_perturb_coefficients (and other in the method) has its own name which we assign when create variables via theano.shared. So, if we want this variable to be learned - we should put in into parameters list of the class GradientDescent.

Also, there are TODO note above beta_perturb_coefficients, I think authors didn't have time to update the code.

To be 100% sure, I run a training loop several times, and output are next: In the start: image After some iteration of the training via: image I got: image

So, parameters do change, but beta coeffs does not.

Hope it helps!

TaplierShiru avatar Oct 23 '22 16:10 TaplierShiru