Results 12 issues of Wuyang

Could you provide more instructions on how to use the yaml files? Specifically: 1) do we need to use tmuxp to load the yaml file? 2) am I correct that:...

Is this switchable norm able to learn a hybrid normalization layer used in [IBN-Net](https://arxiv.org/abs/1807.09441)? As the IBN-Net can improve the domain generalization performance. How could we train the switchable norm...

Hi @MerkulovDaniil ! Thanks for preparing this material on optimization! Just wonder if you prepared solutions to the exercises? Thanks!

Thank you very much for this great work! Regarding the calculation here: https://github.com/thegregyang/NTK4A/blob/master/Transformer-NTK.ipynb May I ask why the attention with key-query scaling $1/d_{head} = 1/n$ is used, instead of $1/\sqrt{d_{head}}$?

Thanks for this great work! Why jvp has to be calculated via NTK parameterized linear layers? What if just using vanilla Conv2d for the jvp calculation? https://github.com/fmu2/gradfeat20/blob/master/src/model.py#L106

# Contributing to CSrankings Thanks for contributing to CSrankings! Please read and indicate you agree with **all** these guidelines to getting your pull request accepted. Note that pull requests may...

Dear authors: I am new to TDA analysis. May I ask if this is the correct place to calculate the pair-wise distance between points in my dataset? (and the pair-wise...

**Describe the bug** When running the transformer-XL example on enwik8, the log shows there are only 204 unique tokens (vocabulary size) in enwik8 training set. **To Reproduce** Steps to reproduce...

Thank you for the great effort! 1. Is the reason of not using nn.Parameters but using Variables can be explained by [this post](https://discuss.pytorch.org/t/nn-parameter-doesnt-retain-grad-fn/29214)? 2. I think relying on Variables is...

Dear Authors: Congratulations on this great work! May I kindly ask when will the pretraining code be available? Thank you very much!