dinov3 icon indicating copy to clipboard operation
dinov3 copied to clipboard

Patch embeddings are all the same with float16 on mps

Open oilst opened this issue 1 month ago • 1 comments

the following code works with

  • float16 or float32 on cuda
  • float32 on mps

When I execute it with float16 on mps, all the patch embeddings are equal.

features = self.embedder.get_intermediate_layers(
    processed_frame,
    n=1,
    reshape=True,
    return_class_token=False,
    norm=True
)
embedding = features[0].squeeze().detach()

oilst avatar Nov 28 '25 08:11 oilst

Hello! I noticed a similar problem when quantizing our model (which uses DinoV3 as a backbone) to FP16. I traced the problem back to the first layer of the model where the attention dot product (before the softmax) yields values higher than the maximum FP16 value possible. Here is a histogram of the attention dot product values after a sample image input:

Image

We implemented a VERY involved fix (normalizing the query and key values before the outer product and fine-tuning for an additional 100 epochs in FP16). If your application requires FP16, my recommendation would be to use Dinov2 where we did not see this problem (also shown in the figure).

bango123 avatar Dec 18 '25 20:12 bango123