Patch embeddings are all the same with float16 on mps
the following code works with
- float16 or float32 on cuda
- float32 on mps
When I execute it with float16 on mps, all the patch embeddings are equal.
features = self.embedder.get_intermediate_layers(
processed_frame,
n=1,
reshape=True,
return_class_token=False,
norm=True
)
embedding = features[0].squeeze().detach()
Hello! I noticed a similar problem when quantizing our model (which uses DinoV3 as a backbone) to FP16. I traced the problem back to the first layer of the model where the attention dot product (before the softmax) yields values higher than the maximum FP16 value possible. Here is a histogram of the attention dot product values after a sample image input:
We implemented a VERY involved fix (normalizing the query and key values before the outer product and fine-tuning for an additional 100 epochs in FP16). If your application requires FP16, my recommendation would be to use Dinov2 where we did not see this problem (also shown in the figure).