dinov2
dinov2 copied to clipboard
How to identify high-norm patches
In "VISION TRANSFORMERS NEED REGISTERS" you wrote:
We clearly see that the norm of artifact patches is much higher than the norm of other patches
Could you please tell which norm (function) and at which dimensions (axis) you used? Suppose we have a tensor of size [batch, 1 (cls_tok) + N (patches), channels]. What should we do to find (and visualize) such artifact patches?
Which norms have you tried so far on the tokens ?
It's L2 norm as per the graph in the paper. It makes most sense for it to betaken on the channels of the patches (no CLS) and they seem to say its at the output layer, so at the output of the model.