dinov3 how to improve inference time

My code is:

def extract_global_vector(backbone, img_tensor, device):
    x = img_tensor.unsqueeze(0).to(device)
    with torch.inference_mode():
        feats = backbone.get_intermediate_layers(x, n=range(12), reshape=True, norm=False)
    last = feats[-1].squeeze(0)
    if last.ndim == 3:
        vec = last.view(last.shape[0], -1).mean(dim=1)
    elif last.ndim == 2:
        vec = last.mean(dim=0)
    else:
        raise RuntimeError(f"Unexpected feature shape: {last.shape}")
    vec = vec.detach().cpu().numpy().astype(np.float32)
    norm = np.linalg.norm(vec) + 1e-8
    vec = vec / norm
    return vec

I want to reduce the time of feature extraction, the main time cost is in this line: feats = backbone.get_intermediate_layers(x, n=range(12), reshape=True, norm=False) Is there any way to improve it?

Oct 31 '25 09:10 rabum

Maybe try to torch.compile your backone? ( backbone=torch.compile(backbone) )

Plus, if your are doing last = feats[-1], would it not be better to just use forward_features or get_intermediate_layers(x, n=1)? (this is a question and I'm not sure they do the same thing as what you are doing here)

Oct 31 '25 18:10 Antuke

Maybe try to torch.compile your backone? ( backbone=torch.compile(backbone) )

Plus, if your are doing last = feats[-1], would it not be better to just use forward_features or get_intermediate_layers(x, n=1)? (this is a question and I'm not sure they do the same thing as what you are doing here)

I tried feats = backbone.forward_features(x) or feats = backbone.get_intermediate_layers(x, n=[11], reshape=True, norm=False) and they are not really faster. But backbone=torch.compile(backbone) did work, thanks a lot! Would like to know more ways of improvement

Nov 03 '25 03:11 rabum

If you're using a GPU you should also auto cast your forward call here to fp16!

Nov 07 '25 13:11 JohnMBrandt