dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

How to identify high-norm patches

Open shkarupa-alex opened this issue 2 years ago • 2 comments

In "VISION TRANSFORMERS NEED REGISTERS" you wrote:

We clearly see that the norm of artifact patches is much higher than the norm of other patches

Could you please tell which norm (function) and at which dimensions (axis) you used? Suppose we have a tensor of size [batch, 1 (cls_tok) + N (patches), channels]. What should we do to find (and visualize) such artifact patches?

shkarupa-alex avatar Nov 01 '23 13:11 shkarupa-alex

Which norms have you tried so far on the tokens ?

qasfb avatar Nov 02 '23 12:11 qasfb

It's L2 norm as per the graph in the paper. It makes most sense for it to betaken on the channels of the patches (no CLS) and they seem to say its at the output layer, so at the output of the model.

ccharest93 avatar Dec 05 '23 07:12 ccharest93