MyGrad Refactor mygrad.nnet.layers

Opening this as a draft so I can get some comments. This changes the overall structure of the mygrad.nnet.layers module. The changes are:

Everything that was in the top-level mygrad.nnet.layers module has moved to mygrad.nnet.layers.operations. This reflects the fact that these were all operator implementations of the layer functionality, but don't do any bookkeeping of parameters.
The top-level mygrad.nnet.layers module now contains helper class Layers that perform the bookkeeping of parameters. This mimics the structure of mynn.layers.

I went ahead and added a BatchNorm layer as an initial example, and will port over Conv, Dense (or Linear or whatever we want to name it), Dropout, and PReLU when we're happy with the design. Main questions:

Does this seem reasonable? Basically mimics how PyTorch has torch.nn where the classes live and torch.nn.functional where the operations live.
What kinds of tests are reasonable here? Obviously some that cover instance creation and basic input validation. Anything else we should try to cover?
How do we want to structure the updates? I'm thinking we could either (1) do it in one swoop and add all the layers here or (2) first just move mygrad.nnet.layers -> mygrad.nnet.layers.operations, then one-by-one add in layers each as their own release.

Feb 06 '22 13:02 davidmascharka

First off, thanks for working on this! It will be really nice to get mynn merged into mygrad 😄

Does this seem reasonable? Basically mimics how PyTorch has torch.nn where the classes live and torch.nn.functional where the operations live.

I am in favor of mygrad.nnet.layers -> mygrad.nnet.layers.operations, but this shouldn't be a compatibility-breaking change. So max_pool, conv_nd etc, should all still be made available in the mygrad.nnet.layers namespace. Otherwise this could be a badly-breaking change for a lot of notebooks out there. This would also prevent you from needing to make any changes in mynn - it would "just work" even though the library will effectively be obsolete.

After this change, we should check to see if the docs will need to be updates as well. E.g. both the filename and contents of mygrad.nnet.layers.batchnorm.rst are path-dependent. But if we maintain the namespaces, as mentioned above, I think things will be OK.

What kinds of tests are reasonable here? Obviously some that cover instance creation and basic input validation. Anything else we should try to cover?

We don't need anything fancy here, I think the following is sufficient:

If the layer takes initializers, test the zeros and ones, respectively, propagates as-expected; this will also serve to test that .parameters() works as-expected
For __call__, no need for any numerical testing, just verify that the computational graph contains the proper operations. E.g. conv w/ bias should:

conv_layer = ConvNDLayer(..., bias=True)
out = conv_layer(data)  # `data` can be trivial - no need for hypothesis

assert isinstance(out.creator, Add)
conv_out, bias = out.creator.variables
assert bias is conv_layer.bias
assert isinstance(conv_out.creator, ConvND) # the operation-class, not the layer

x, w = conv_out.creator.variables
assert w is conv_layer.w
assert x is data

How do we want to structure the updates? I'm thinking we could either (1) do it in one swoop and add all the layers here or (2) first just move mygrad.nnet.layers -> mygrad.nnet.layers.operations, then one-by-one add in layers each as their own release.

I am in favor of (2)

Perhaps we should make a super-simple base class, just with a __call__ and .parameters, just so we have the benefits of a hierarchy. In a later PR, I might think about adding functionality for saving/loading, but it's not a big deal.

Feb 19 '22 15:02 rsokl

:+1: this all makes sense. thanks for reading through! I'll make these changes over the next few(?) days and get the ball rolling on this merge

Feb 22 '22 20:02 davidmascharka

It occurred to me that, for testing __call__ of modules, we should also check that backprop propagates gradients to the module's parameters. No need to check for numerical correctness (checking that the computational graph is as-expected is sufficient for that), we simply want to make sure that the parameters aren't initialized as constants (which could be via constant=True, integer-valued biases, or initialization as a numpy array).

So basically:

conv_layer = ConvNDLayer(..., bias=True)
out = conv_layer(data)  # `data` can be trivial - no need for hypothesis

assert isinstance(out.creator, Add)
conv_out, bias = out.creator.variables
assert bias is conv_layer.bias
assert isinstance(conv_out.creator, ConvND) # the operation-class, not the layer

x, w = conv_out.creator.variables
assert w is conv_layer.w
assert x is data

assert w.grad is None
assert bias.grad is None

out.backward()

assert w.grad is not None
assert bias.grad is not None

Feb 27 '22 18:02 rsokl

MyGrad MyGrad copied to clipboard

Refactor mygrad.nnet.layers

MyGrad
MyGrad copied to clipboard