MyGrad
MyGrad copied to clipboard
Refactor mygrad.nnet.layers
Opening this as a draft so I can get some comments. This changes the overall structure of the mygrad.nnet.layers
module. The changes are:
- Everything that was in the top-level
mygrad.nnet.layers
module has moved tomygrad.nnet.layers.operations
. This reflects the fact that these were all operator implementations of the layer functionality, but don't do any bookkeeping of parameters. - The top-level
mygrad.nnet.layers
module now contains helper class Layers that perform the bookkeeping of parameters. This mimics the structure ofmynn.layers
.
I went ahead and added a BatchNorm
layer as an initial example, and will port over Conv
, Dense
(or Linear
or whatever we want to name it), Dropout
, and PReLU
when we're happy with the design. Main questions:
- Does this seem reasonable? Basically mimics how PyTorch has
torch.nn
where the classes live andtorch.nn.functional
where the operations live. - What kinds of tests are reasonable here? Obviously some that cover instance creation and basic input validation. Anything else we should try to cover?
- How do we want to structure the updates? I'm thinking we could either (1) do it in one swoop and add all the layers here or (2) first just move
mygrad.nnet.layers
->mygrad.nnet.layers.operations
, then one-by-one add in layers each as their own release.
First off, thanks for working on this! It will be really nice to get mynn merged into mygrad 😄
Does this seem reasonable? Basically mimics how PyTorch has torch.nn where the classes live and torch.nn.functional where the operations live.
I am in favor of mygrad.nnet.layers
-> mygrad.nnet.layers.operations
, but this shouldn't be a compatibility-breaking change. So max_pool
, conv_nd
etc, should all still be made available in the mygrad.nnet.layers
namespace. Otherwise this could be a badly-breaking change for a lot of notebooks out there. This would also prevent you from needing to make any changes in mynn
- it would "just work" even though the library will effectively be obsolete.
After this change, we should check to see if the docs will need to be updates as well. E.g. both the filename and contents of mygrad.nnet.layers.batchnorm.rst are path-dependent. But if we maintain the namespaces, as mentioned above, I think things will be OK.
What kinds of tests are reasonable here? Obviously some that cover instance creation and basic input validation. Anything else we should try to cover?
We don't need anything fancy here, I think the following is sufficient:
- If the layer takes initializers, test the
zeros
andones
, respectively, propagates as-expected; this will also serve to test that.parameters()
works as-expected - For
__call__
, no need for any numerical testing, just verify that the computational graph contains the proper operations. E.g. conv w/ bias should:
conv_layer = ConvNDLayer(..., bias=True)
out = conv_layer(data) # `data` can be trivial - no need for hypothesis
assert isinstance(out.creator, Add)
conv_out, bias = out.creator.variables
assert bias is conv_layer.bias
assert isinstance(conv_out.creator, ConvND) # the operation-class, not the layer
x, w = conv_out.creator.variables
assert w is conv_layer.w
assert x is data
How do we want to structure the updates? I'm thinking we could either (1) do it in one swoop and add all the layers here or (2) first just move mygrad.nnet.layers -> mygrad.nnet.layers.operations, then one-by-one add in layers each as their own release.
I am in favor of (2)
Perhaps we should make a super-simple base class, just with a __call__
and .parameters
, just so we have the benefits of a hierarchy. In a later PR, I might think about adding functionality for saving/loading, but it's not a big deal.
:+1: this all makes sense. thanks for reading through! I'll make these changes over the next few(?) days and get the ball rolling on this merge
It occurred to me that, for testing __call__
of modules, we should also check that backprop propagates gradients to the module's parameters. No need to check for numerical correctness (checking that the computational graph is as-expected is sufficient for that), we simply want to make sure that the parameters aren't initialized as constants (which could be via constant=True
, integer-valued biases, or initialization as a numpy array).
So basically:
conv_layer = ConvNDLayer(..., bias=True)
out = conv_layer(data) # `data` can be trivial - no need for hypothesis
assert isinstance(out.creator, Add)
conv_out, bias = out.creator.variables
assert bias is conv_layer.bias
assert isinstance(conv_out.creator, ConvND) # the operation-class, not the layer
x, w = conv_out.creator.variables
assert w is conv_layer.w
assert x is data
assert w.grad is None
assert bias.grad is None
out.backward()
assert w.grad is not None
assert bias.grad is not None