NTK4A
NTK4A copied to clipboard
Code for the paper: "Tensor Programs II: Neural Tangent Kernel for Any Architecture"
Results
2
NTK4A issues
Sort by
recently updated
recently updated
newest added
Thank you very much for this great work! Regarding the calculation here: https://github.com/thegregyang/NTK4A/blob/master/Transformer-NTK.ipynb May I ask why the attention with key-query scaling $1/d_{head} = 1/n$ is used, instead of $1/\sqrt{d_{head}}$?
This PR implements calculating the BiRNN NTK using summation and concatenation on the BiRNN hidden states. The PR adds the python notebook, edits to utils, and NTK frob distance files...