axon Add Eli & Rishi's correlation-based learning mechanism

Add Eli & Rishi's correlation-based learning mechanism

Open rcoreilly opened this issue 4 years ago • 2 comments

Aug 11 '21 10:08 rcoreilly

http://arxiv.org/abs/2011.07334 -- the key equation is: (Var_i + Var_j - 2 Covar_ij) -- optimize variance of sender and receiver and minimize covariance between the two.

In my experiments, I updated the SWt (structural, spine, slow) weight in the slower outer-loop cycle as a function of accumulated Var and Covar stats (computed using simple running-average act - mean values) -- this produces a graded form of pruning-like function, because SWt multiplies the regular "fast" learned weights, so when it is reduced toward 0, it produces an effective "soft" form of pruning.

Having worked through the logic here better, I realized that I had an error in the initial implementation: missed the factor of 2 on Covar_ij and also that the pruning logic would make more sense to only include the negative component of this value -- otherwise we're getting a hebbian-like variance increasing force that is constantly working to increase the weights. That is not present in the pruning version.

Aug 14 '21 04:08 rcoreilly

Looks like using both positive and negative at .1 learning rate works well in large-scale lvis object recognition model -- significantly reduces the strength of top-5 PCA components while driving solid "n strong" PCA components throughout learning. Still need to fix output layer dynamics, but decoding shows continued learning throughout!

Aug 17 '21 19:08 rcoreilly

axon axon copied to clipboard

Add Eli & Rishi's correlation-based learning mechanism

axon
axon copied to clipboard