gradnorm-pytorch
gradnorm-pytorch copied to clipboard
How to properly set grad_norm_parameters ?
Let's assume that I have a single image feature extractor on top of which there are two linear classification heads. What should I set grad_norm_parameters in this case ? Is it the entire network?