Wuyang issues

Results 12 issues of


Wuyang

Usage of yaml files

Could you provide more instructions on how to use the yaml files? Specifically: 1) do we need to use tmuxp to load the yaml file? 2) am I correct that:...

Is this switchable norm able to learn a hybrid normalization layer used in [IBN-Net](https://arxiv.org/abs/1807.09441)? As the IBN-Net can improve the domain generalization performance. How could we train the switchable norm...

Solution to exercises

Hi @MerkulovDaniil ! Thanks for preparing this material on optimization! Just wonder if you prepared solutions to the exercises? Thanks!

About transformer attention scaling

Thank you very much for this great work! Regarding the calculation here: https://github.com/thegregyang/NTK4A/blob/master/Transformer-NTK.ipynb May I ask why the attention with key-query scaling $1/d_{head} = 1/n$ is used, instead of $1/\sqrt{d_{head}}$?

Why jvp has to be calculated via NTK parameterized linear layers?

Thanks for this great work! Why jvp has to be calculated via NTK parameterized linear layers? What if just using vanilla Conv2d for the jvp calculation? https://github.com/fmu2/gradfeat20/blob/master/src/model.py#L106

add new assistant professors at Simon Fraser University

# Contributing to CSrankings Thanks for contributing to CSrankings! Please read and indicate you agree with **all** these guidelines to getting your pull request accepted. Note that pull requests may...

Distance between points

Dear authors: I am new to TDA analysis. May I ask if this is the correct place to calculate the pair-wise distance between points in my dataset? (and the pair-wise...

Only 204 unique tokens (vocabulary size) in enwik8 (transformer-XL example)

**Describe the bug** When running the transformer-XL example on enwik8, the log shows there are only 204 unique tokens (vocabulary size) in enwik8 training set. **To Reproduce** Steps to reproduce...

Any idea to scale up to complex networks?

Thank you for the great effort! 1. Is the reason of not using nn.Parameters but using Variables can be explained by [this post](https://discuss.pytorch.org/t/nn-parameter-doesnt-retain-grad-fn/29214)? 2. I think relying on Variables is...

When will the pretraining code be available?

Dear Authors: Congratulations on this great work! May I kindly ask when will the pretraining code be available? Thank you very much!