dni-pytorch
dni-pytorch copied to clipboard
BasicSynthesizer last layer bias not zero
zero-initialize the last layer, as in the paper
if n_hidden > 0:
init.constant(self.layers[-1].weight, 0)
else:
init.constant(self.input_trigger.weight, 0)
if context_dim is not None:
init.constant(self.input_context.weight, 0)
The BasicSynthesizer class sets all weights to zero for the final layer of the DNI. However, this does not set the biases to zero. From the paper:
"The final regression layer of all synthetic gradient models are initialised with zero weights and biases, so initially, zero synthetic gradient is produced."
In my experiments I observed the initial DNI gradient was not zero, but was equal to the bias term.