neural-fortran Fc2d layer

Fully-Connected Layer for 2D Shapes

Also known as MLP, FeedForward, etc. A common component of neural networks, including transformers. The idea is very simple: first linear transformation => activation => second linear transformation.

This is the last piece of tranformer architecture. When #203, #205 and this one are merged. We can start adding transformer encoders and decoders.

Python reference: https://github.com/OneAdder/neural-fortran-references/blob/main/fc2d_layer.py

Problem

Softmax derivative here is incorrect. This implementation is actually prime of logistic function which does not equivalent to softmax. Derivative of softmax w.r.t. to each element in input requires computation of Jacobian matrix:

$jacobian_{i, j} = \pmatrix{\frac{dsofmax_1}{dx_1} & ... & \frac{dsofmax_1}{dx_j} \cr ... & ... & ... \cr\frac{dsofmax_i}{dx_1} & ... & \frac{dsofmax_i}{dx_j} }$ $\frac{dsoftmax}{dx} = gradient \times jacobian$

Where:

$\frac{dsoftmax_i}{x_j} = softmax(x_j) \cdot (\alpha - softmax(x_i))$ where $\alpha$ is $1$ for $i = j$, $0$ otherwise
$x$ is the input sequence

Similar to my implementation for MultiHead Attention here.

Possible Solutions

It is not easy to resolve as activation_function doesn't accept input, so:

Do nothing, I added crutch that throws an error when softmax is passed as activation
Make softmax a layer without parameters rather than an activation function, this will work
Make a wrapper activation_layer that extends base_layer and accepts activation function

Feb 24 '25 21:02 OneAdder

@OneAdder Please forgive my ignorance here. Could you please clarify the distinction between the fc2d layer and the dense layer?

Feb 25 '25 08:02 jvdp1

@jvdp1 The terms are not particularly well defined here in practice. This is also sometimes called dense. The mathematical distinction is that dense in neural-fortran is linear transformation => activation while my fc2d is linear transformation => activation => linear transformation. Theoretically the same as dense(some_activation) => dense(linear_activation). The key difference here is from software development perspective. fc2d works with 2D shape. dense can't handle those

Feb 25 '25 09:02 OneAdder

Thanks @OneAdder for starting this. From your explanation I understand what this does.

Rather than introducing a composition of multiple operations as a single layer, I suggest that we build a basic building block first, and then if needed, we can add a "shallow-wrapper" layer around those elementary layers.

Specifically, rather than introducing here a new layer that does "first linear transformation => activation => second linear transformation", I suggest we simply introduce a dense2d layer which is the same as dense but that works on 2-d inputs.

Then, the operation proposed here would be: dense2d(activation) => dense2d(linear). We already have a linear activation function which allows using existing dense layers as linear layers. What do you think?

And thanks for pointing out the incorrect softmax derivative. I don't even recall how and why I did that.

Mar 19 '25 21:03 milancurcic

@milancurcic It makes sense. I can do it. Should we merge this and then refactor it or the other way around? BTW, I think we should actually make a consistent API for combined layers. Something along the lines of the following: base_layer is inherited by combined_layer which implements get and set params and gradients methods which point to the params of the layers that make up the combined layer. Combined layers extend combined_layer class

Mar 26 '25 16:03 OneAdder

Thanks, @OneAdder. If you agree, I suggest that here we simply provide a 2-d version of an existing dense layer (I suggest dense2d) which accepts an activation function as well as a linear activation as a special case.

Good ideas for combined_layer but let's discuss it in a separate issue. I opened #217.

Mar 26 '25 22:03 milancurcic

Actually, since we already have linear2d, should we just refactor it to accept an activation function and thus call it dense2d? Then, creating dense2d(..., activation="linear") would give us linear2d.

Mar 26 '25 22:03 milancurcic

@milancurcic on it

Apr 13 '25 11:04 OneAdder

neural-fortran neural-fortran copied to clipboard

Fc2d layer

Fully-Connected Layer for 2D Shapes

Problem

Possible Solutions

neural-fortran
neural-fortran copied to clipboard