JulioHC00 comments

Results 18 comments of


                                            JulioHC00

Add Support for More nn.Modules Layers.

@CloseChoice Thanks for the paper! I've had a quick look, and it looks like it may be useful. I'm getting mixed results with the `nonlinear_1d` solution, and it just completely...

Add Support for More nn.Modules Layers.

@CloseChoice I'll have a try at doing that, though I'm still trying to figure out what needs impementing exactly. There's the taylor expansion of the LayerNorm operation, but what does...

Add Support for More nn.Modules Layers.

I gave it a try with this ```python def compute_partial_derivative_matrix_torch(x, alpha, beta, epsilon): import torch B, N = x.shape # Compute the mean and variance along the feature dimension (N)...

Add Support for More nn.Modules Layers.

@CloseChoice Will do! If I have time I'll try it later today, if not tomorrow morning and I'll let you know how it goes. Thanks!

Add Support for More nn.Modules Layers.

@CloseChoice Quick update. It does seem to solver the prblem and it works with my example. I'll add it as a test and open a PR. The model at #3881...

Add Support for More nn.Modules Layers.

I'll try to check these. They definitely not appear as children as I checked earlier today. Is the issue then that Pytorch is somehow not managing their gradients in the...

Add Support for More nn.Modules Layers.

@CloseChoice Sounds good! I've made a PR #3890 for the LayerNorm part (also added Identity as passthrough). Let me know if it looks ok.

Add Support for More nn.Modules Layers.

I've been doing some tests and it does seem something like `transpose` will cause issues. I guess `transpose` is just a less flexible `permute` so it all makes sense.