MyGrad Implement mean-pooling neural network operation

Okay, so this might not exactly be a "good first issue" - it is a little more advanced, but is still very much accessible to newcomers.

Similar to the mygrad.nnet.max_pool function, I would like there to be a mean-pooling layer. That is, a convolution-style windows is strided over the input, and the mean is computed for each window. E.g. the following is shows how mean-pooling should work on a shape-(3, 3) tensor, using a shape-(2, 2) pooling window strided with a step-size of 1 (both along the rows and the columns.

>>> import  mygrad as mg
>>> x = mg.Tensor([[0., 1.,  2.],
...                [3., 4.,  5.],
...                [6., 7., 8.]])

# Forward Pass
>>> out = mean_pool(x, pool=(2, 2), stride=1)
>>> out
Tensor([[2., 3.],
        [5., 6.]])

# Backprop
>>> out.sum().backward()  # must backprop from a scalar, thus we sum `out`
>>> x.grad
array([[0.25, 0.5 , 0.25],
       [0.5 , 1.  , 0.5 ],
       [0.25, 0.5 , 0.25]])

Like max_pool, this function should accommodate N-dimensional tensors. mygrad.sliding_window_view makes short work of this. This function basically boils down to taking the appropriate sliding-window view of the underlying numpy array of the input tensor, and using numpy.mean to take the average over the trailing N dimensions that you want to pool over. This is much easier than doing max-pooling, since numpy.mean is able to accept multiple axes .

Try starting with the forward pass for the 1D and 2D cases only. I can help you generalize to N-dimensions if you get stuck. I am also happy to help derive the proper back-propagation for this.

Jan 22 '19 22:01 rsokl

Do you think it would be possible to provide a general-purpose pool function that can take in whatever arbitrary expression? Would be nice as you could then do:

y = mg.pool(x, pool=(2, 2), stride=2, fn=mg.mean)
z = mg.pool(x, pool=(2, 2), stride=2, fn=mg.max)
w = mg.pool(x, pool=(2, 2), stride=2, fn=mg.sum)

for example and avoid a lot of bloat

Dec 12 '19 18:12 davidmascharka

Warning: stream-of-consciousness ahead.

I have been thinking about this. And it all basically comes down to this line:

        np.add.at(dx, tuple(index), grad.reshape(*x.shape[:-num_pool], -1))

dx is the gradient to write to. index stores the locations of where to update the gradient (which has had the pooling axes be flattened). And grad is the gradient being backpropped (also flattened to be commensurate with the contents of index).

The issue, then, is that max and min only accumulate one value per window, whereas sum/mean broadcast out to the entire window. It isn't super clear to me how to compute index in an a priori way that accommodates these various cases.

Ultimately I think this comes down to: is there some useful, not totally-inefficient way that we can fuse an operation with sliding-window-view that supports backprop. The totally naive way of doing this would say: if you operate on N windows, then we form a computational graph with N nodes that we backprop through. Clearly this is just too unwieldy.

It would be really neat to do some internal op-fusion with sliding-window-view that internally invokes the op's backprop machinery over the windows in a not-dumb way. This would be a super super nice win. And mygrad would actually kind of be the best. I should really think about this.

@petarmhg you might be interested in this convo.

Dec 12 '19 18:12 rsokl

MyGrad MyGrad copied to clipboard

Implement mean-pooling neural network operation

MyGrad
MyGrad copied to clipboard