Patch embeddings are all the same with float16 on mps

Open oilst opened this issue 1 month ago • 1 comments

the following code works with

float16 or float32 on cuda
float32 on mps

When I execute it with float16 on mps, all the patch embeddings are equal.

features = self.embedder.get_intermediate_layers(
    processed_frame,
    n=1,
    reshape=True,
    return_class_token=False,
    norm=True
)
embedding = features[0].squeeze().detach()

Nov 28 '25 08:11 oilst

Hello! I noticed a similar problem when quantizing our model (which uses DinoV3 as a backbone) to FP16. I traced the problem back to the first layer of the model where the attention dot product (before the softmax) yields values higher than the maximum FP16 value possible. Here is a histogram of the attention dot product values after a sample image input:

We implemented a VERY involved fix (normalizing the query and key values before the outer product and fine-tuning for an additional 100 epochs in FP16). If your application requires FP16, my recommendation would be to use Dinov2 where we did not see this problem (also shown in the figure).

Dec 18 '25 20:12 bango123