Incorrect keypoint batch handling inside SuperGlueForKeypointMatching
System Info
transformersversion: 4.51.3- Platform: Linux-6.14.6-arch1-1-x86_64-with-glibc2.41
- Python version: 3.12.10
- Huggingface_hub version: 0.30.2
- Safetensors version: 0.5.3
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
Who can help?
@qubvel @sbucaille
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Install
pytorch,pillow,and transformers=4.51.3using either pip or pixi. - Run the following script:
import torch
from transformers import AutoImageProcessor, AutoModel
from PIL import Image
import requests
class Test:
def __init__(self):
self.processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
self.model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")
@torch.inference_mode()
def get_keypoints(
self,
series1: list[Image.Image],
series2: list[Image.Image]
):
images = []
for s1, s2 in zip(series1, series2):
images.append([s1, s2])
processor_inputs = self.processor(images, return_tensors="pt")
outputs = self.model(**processor_inputs)
image_sizes = [[(s1.height, s1.width), (s2.height, s2.width)]
for s1, s2 in zip(series1, series2)]
processed_outputs = self.processor.post_process_keypoint_matching(
outputs, image_sizes
)
return processed_outputs
url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image2 = Image.open(requests.get(url_image2, stream=True).raw)
test = Test()
kps = test.get_keypoints((image1, image1), (image2, image2))
assert torch.equal(kps[0]['keypoints0'], kps[1]['keypoints0'])
print("Assertion succeeded!")
Expected behavior
The script executes successfully and get_keypoints returns two exact same arrays, assertion succeeds.
I tried to use SuperGlueForKeypointMatching (added in #29886) for batch inference but I found that while it works with single images well, it fails to do batch inference. I believe this is caused by incorrect concatenation inside SuperGlueForKeypointMatching._match_image_pair:
https://github.com/huggingface/transformers/blob/d0c9c66d1c09df3cd70bf036e813d88337b20d4c/src/transformers/models/superglue/modeling_superglue.py#L726-L727
Changing this seemingly fixed the issue for me.
matches = torch.cat([matches0, matches1], dim=1).reshape(batch_size, 2, -1)
matching_scores = torch.cat([matching_scores0, matching_scores1], dim=1).reshape(batch_size, 2, -1)
Why will this often fail?
In keypoint detection or matching tasks, especially for two consecutive images or different samples, the detected keypoints are almost never identical.
Even for the same image processed twice, minor differences (e.g., from random noise, augmentation, or algorithmic variance) can make the keypoint coordinates different.
When would this assertion succeed?
Only when kps[0]['keypoints0'] and kps[1]['keypoints0'] are exactly the same tensor (identical values, shape, dtype).
keypoints0: coordinates of keypoints in image 0 (usually shape: [N, 2], where N is the number of keypoints, and each entry is [x, y])
keypoints1: coordinates of keypoints in image 1 (shape: [M, 2], for M keypoints)
keypoint0 keypoint1 score [644, 20] [712, 179] 0.9726 [650, 65] [715, 215] 0.7948 [638, 66] [707, 213] 0.8859
score is the answer of similarity
@gspeter-max are these comments AI-generated? This fails always, not often. I can't make any sense out of this, your comments look hallucinated. I should have probably clarified that I don't expect kps[0]['keypoints0'] and kps[1]['keypoints0'] to be exactly identical, but as I said in the OP there's clearly something wrong with the way matching scores are concatenated inside transformers, which is likely fixed by the changes I suggested.
If I try to run this code as is, without modifying transformers/src/transformers/models/superglue/modeling_superglue.py:
import torch
from transformers import AutoImageProcessor, AutoModel
from PIL import Image
import matplotlib.pyplot as plt
import requests
class Test:
def __init__(self):
self.processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
self.model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")
@torch.inference_mode()
def get_keypoints(
self,
series1: list[Image.Image],
series2: list[Image.Image]
):
images = []
for s1, s2 in zip(series1, series2):
images.append([s1, s2])
processor_inputs = self.processor(images, return_tensors="pt")
outputs = self.model(**processor_inputs)
image_sizes = [[(s1.height, s1.width), (s2.height, s2.width)]
for s1, s2 in zip(series1, series2)]
processed_outputs = self.processor.post_process_keypoint_matching(
outputs, image_sizes
)
return processed_outputs
urls = [
[
"https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg",
"https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
],
[
"https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/piazza_san_marco_06795901_3725050516.jpg",
"https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/piazza_san_marco_15148634_5228701572.jpg"
],
[
"https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/st_pauls_cathedral_30776973_2635313996.jpg",
"https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/st_pauls_cathedral_37347628_10902811376.jpg"
]
]
pairs = []
for url_pair in urls:
pair = []
for url in url_pair:
im = Image.open(requests.get(url, stream=True).raw)
pair.append(im)
pairs.append(pair)
series1 = []
series2 = []
for im1, im2 in pairs:
series1.append(im1)
series2.append(im2)
# Inference
test = Test()
kps_batched = test.get_keypoints(series1, series2)
kps_single = []
for im1, im2 in zip(series1, series2):
kps = test.get_keypoints((im1,), (im2,))[0]
kps_single.append(kps)
print("\nNon-batched:")
for i, kps in enumerate(kps_single):
print(f"kps[{i}]: keypoints0={kps['keypoints0'].shape}, keypoints1={kps['keypoints1'].shape}, matching_scores.mean={kps['matching_scores'].mean()}")
print("\nBatched:")
for i, kps in enumerate(kps_batched):
print(f"kps[{i}]: keypoints0={kps['keypoints0'].shape}, keypoints1={kps['keypoints1'].shape}, matching_scores.mean={kps['matching_scores'].mean()}")
I will get this error:
Traceback (most recent call last):
File "<>/test.py", line 69, in <module>
kps_batched = test.get_keypoints(series1, series2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<>/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "<>/test.py", line 30, in get_keypoints
processed_outputs = self.processor.post_process_keypoint_matching(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<>/python3.12/site-packages/transformers/models/superglue/image_processing_superglue.py", line 393, in post_process_keypoint_matching
matched_keypoints1 = keypoints1[matches0[valid_matches]]
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 1418 is out of bounds for dimension 0 with size 1371
If, however, I apply the changes I described, the code runs successfully:
Non-batched:
kps[0]: keypoints0=torch.Size([233, 2]), keypoints1=torch.Size([233, 2]), matching_scores.mean=0.4552536904811859
kps[1]: keypoints0=torch.Size([399, 2]), keypoints1=torch.Size([399, 2]), matching_scores.mean=0.4019520580768585
kps[2]: keypoints0=torch.Size([256, 2]), keypoints1=torch.Size([256, 2]), matching_scores.mean=0.3144405484199524
Batched:
kps[0]: keypoints0=torch.Size([234, 2]), keypoints1=torch.Size([234, 2]), matching_scores.mean=0.4524313807487488
kps[1]: keypoints0=torch.Size([399, 2]), keypoints1=torch.Size([399, 2]), matching_scores.mean=0.40192607045173645
kps[2]: keypoints0=torch.Size([256, 2]), keypoints1=torch.Size([256, 2]), matching_scores.mean=0.3144405484199524
AssertionError Traceback (most recent call last)
AssertionError:
Thanks for responding. When I tried to reproduce this error, I got this error. So I think you were trying to prove that the values are not equal — that’s why you created this issue. I didn’t read the documentation about this error properly. Sorry about that, by the way.
cc @sbucaille if you have an idea why it may fail
Hey !
@i44p is totally right, these two lines concatenate the matches and the scores incorrectly.
In the current implementation, having 3 pairs of image results in this concatenation [im0-a, im0-b, im1-a] <matches> [im1-b, im2-a, im2-b] instead of [im0-a, im1-a, im2-a] <matches> [im0-b, im1-b, im2-b]
Thanks for catching it ! Also sorry I didn't see OP notification, hope it wasn't critical for your work
@qubvel I opened https://github.com/huggingface/transformers/pull/38850 which fixes the issue (full credit to @i44p though 😅)