dinov2 Inference in half precision

Hey,

just wanted to ask whether it is safe to run DINOv2 in half precision for inference. Is there any degradation in the quality of the features?

Thanks!

Feb 28 '24 16:02 treasan

Whether or not float16 leads to a degradation probably depends on the use case. For example, if you pass the features into a model that's very heavily fit to the output of the float32 version, maybe it breaks on float16. This seems to (very rarely) happen on some depth estimation models I've tried. Using bfloat16 usually fixes the problem.

Following from issue #373, here's a comparison of the token norms at each block of vit-l running on the same input image, with float32 on the left & float16 on the right. Qualitatively at least, they're identical. Even the 'high norm' tokens are the same, which suggests that the float16 conversion doesn't lead to unstable results. vitl_f32_vs_f16

Feb 28 '24 17:02 heyoeyo

Thank you very much!

Feb 29 '24 00:02 treasan

Just as a follow up, to not give the impression that bfloat16 is always better to use, here's a small zoomed/cropped section of a depth estimate of some rocks on the ground. There's a plane-of-best-fit removal and a contrast boost, so it's a bit of an extreme example, just to show the differences. Float32 is on the left, then float16, then bfloat16:

f32_f16_bf16

Float32 and 16 look very similar (there are slight differences when flipping between images though). On the otherhand, bfloat16 has visible grid-like artifacts.

From what I've seen, float16 is usually fine, but rarely will give random inf or NaN results, in which case bfloat16 tends to give more reasonable results (but otherwise always has small artifacts).

Feb 29 '24 16:02 heyoeyo

hi @heyoeyo, I am curious about the inference speed up using float16 or bfloat16 inference, would you care to share your experience with it?

Sep 11 '24 01:09 maulanaazhari

hi @maulanaazhari

Switching from float32 to float16 or bfloat16 consistently gives a 2x speed up on almost any model I've tried (the only exception is very small models). If you're also using xFormers (with dinov2), then that seems to give an additional speed up when using f16.

Sep 11 '24 13:09 heyoeyo

dinov2 dinov2 copied to clipboard

Inference in half precision

dinov2
dinov2 copied to clipboard