Non-local_pytorch
Non-local_pytorch copied to clipboard
Softmax activation
If you test Softmax instead scaling by 1/N
Do you mean the dot product version? I have not tested it yet.
The paper directly applies 1/N to normalize f to simplify gradient computation and conducts experiments to demonstrate that the performances of different version is similar.
But I think normalization by softmax maybe better than by 1/N. If you are interested in it, you can do this experiment and share the result to me, thanks!
@LDOUBLEV so have tested the Softmax instead of 1/N for dot product and concatenation version?