NeuralPDE.jl
NeuralPDE.jl copied to clipboard
[WIP] Adaptive activation functions
Work-in-progress for #355
I agree with Chris that I think most of that work should not go here as it is mostly neural architecture related. I think it could make sense to add to the DiffEqFlux repository as was mentioned in the issue for this method. Alternatively we could make a new file in this repo such as networks.jl
and have it go there along with other neural architectures that are implemented with PINNs in mind. If you want to include the adaptive activation function in the first paper referenced https://arxiv.org/pdf/1906.01170 and also the second paper https://arxiv.org/pdf/1909.12228 , then I think it would make sense to write a single function such as
function AdaptiveActivationFeedForwardNetwork(hyperparameters...)
function slope_recovery_loss_func(phi, θ, p)
# calculate the slope_recovery loss function here as a function of the θ parameters that are generated for this network
return regularizer_loss
end
return (network=FastChain(...), loss_func=slope_recovery_loss_func)
end
where you return a NamedTuple
of either a DiffEqFlux.FastChain
or a Flux.Chain
, and the function that will compute the Slope Recovery loss from the second paper. Then the user would be able to pass in that network to the PINN and the Slope Recovery loss function to the additional_loss
input to the PINN. If it ends up being a huge improvement to the overall function of the learning process then it might make sense to include it in the internals of the PINN but I think for now it makes sense to generate the network and feed in the Slope Recovery regularizer function through the external interface, and I don't think any changes in the internals of the PINN implementation will be required to implement these algorithms.
The hyperparameters you would want to include (as arguments to that function) are at least:
- Number of hidden layers
- Dimension of each hidden layer
- Number of inputs and number of outputs
- Which nonlinearity to apply after the scaling from the paper
- Feedforward network hyperparameters such as initial parameter distributions
- Which of the three different forms of adaptive activation to use
-
n
value for the algorithm: initiala
should be scaled such thatn*a=1
for eacha
parameter
and possibly others that I didn't think of that will become apparent to you during implementation
Also I think there's an issue with your line ending commit style, and that's why almost every line has a change. Are you committing in Windows line-ending style? I believe we're using Unix line-ending style and having the two be different would result in almost every line being changed constantly (like what is being observed here).
I think it's an option in your git config settings.
Thanks for the pointers! I think I have a clearer idea of what to do now. I'll create a networks.jl
in this repo and start working on that.
Also yes, I had Windows line-ending style for now, I'll change to Unix line-ending style for further commits.
Also, here's an example of using the additional_loss
interface for including your own loss terms in the PINN optimization:
https://neuralpde.sciml.ai/stable/pinn/parm_estim/
I have re written the skeleton in a new file networks.jl
and removed the previous changes made to pinns_pde_solve.jl
.
I wanted to ask:
- In the main function
AdaptiveActivationFeedForwardNetwork
we need the user to specify which type of adaptive function should be used. In what way should the parameter of the function for this purpose be written?
I've been really busy with a project deadline on Tuesday, I should be able to do a thorough review and guide after that.