key shape torch.Size([400, 1, 256]) does not match value shape torch.Size([1, 1, 256])
Describe the bug I followed the readme to train hgnetv2 on coco dataset. I downloaded coco dataset, modified yml files as instructed, and ran the train command.
It instantly raises a pytorch error:
File "/home/gabriel/Documents/D-FINE/src/zoo/dfine/hybrid_encoder.py", line 279, in forward
src, _ = self.self_attn(q, k, value=src, attn_mask=src_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gabriel/miniconda3/envs/dfine/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gabriel/miniconda3/envs/dfine/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
return inner()
^^^^^^^
File "/home/gabriel/miniconda3/envs/dfine/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1827, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gabriel/miniconda3/envs/dfine/lib/python3.11/site-packages/torch/nn/modules/activation.py", line 1380, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gabriel/miniconda3/envs/dfine/lib/python3.11/site-packages/torch/nn/functional.py", line 6293, in multi_head_attention_forward
assert key.shape == value.shape, (
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: key shape torch.Size([400, 1, 256]) does not match value shape torch.Size([1, 1, 256])
By tracing back the tensor, it seems like the tensor of size[1, 1, 256] has been produced just after the backbone (HGNetv2) in the DFINE.forward method
class DFINE(nn.Module):
__inject__ = [
"backbone",
"encoder",
"decoder",
]
def __init__(
self,
backbone: nn.Module,
encoder: nn.Module,
decoder: nn.Module,
):
super().__init__()
self.backbone = backbone
self.decoder = decoder
self.encoder = encoder
def forward(self, x, targets=None):
x = self.backbone(x) # <-- Here x becomes the wrong shape
x = self.encoder(x)
x = self.decoder(x, targets)
I am clueless about what I am doing. The readme is really hard to follow for a newbie, and I don't know how this is supposed to work.
I tried with the .yml file of d-fine-s trained on coco+object365.
By the way, I wanted to finetune D‑FINE‑S, not HGNetv2. How can I fine tune the model I downloaded ? How can I be sure it finetunes the right model trained on the right datasets ?
Thank you.
Desktop (please complete the following information):
- OS: Ubuntu 24.04
- Version: Just the latest commit on master