dni-pytorch
dni-pytorch copied to clipboard
Decoupled Neural Interfaces using Synthetic Gradients for PyTorch
# zero-initialize the last layer, as in the paper if n_hidden > 0: init.constant(self.layers[-1].weight, 0) else: init.constant(self.input_trigger.weight, 0) if context_dim is not None: init.constant(self.input_context.weight, 0) The BasicSynthesizer class sets all...
What it says on the tin.
As the issue title. My base module is a 3-layers GRU, and the synthetic module is another RNN. I want to training base module in `BPTT` mode without synthetic gradients,...