SuperGluePretrainedNetwork
SuperGluePretrainedNetwork copied to clipboard
DO NMS before or after keypoints are dectected, what's the differences?
Hi, I notice that there are slightly differences between Magicleap SuperPoint implementation by Daniel and this repo.
In this repo, nms
(defined by a max_pooling kernel) is performed on heatmap/scores matrix (https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/models/superpoint.py#L167), but in Daniel's version, nms
is perform on keypoints locations(https://github.com/magicleap/SuperPointPretrainedNetwork/blob/master/demo_superpoint.py#L258). I can't figure out why you make this revision and what's the differences of the final keypoint localtion? Thanks for your reply:)
@Skydes
Hi @Vincentqyw, good question! The two approaches produce nearly-identical results but the implementation of this repo:
- is much faster: it can run on GPU (it works on a
torch.Tensor
) and has near-constant time (instead of iterating over the keypoints) - can be batched if images have the same size.
The whole forward pass of SuperPoint is actually batched, so we used this implementation of SuperPoint in our training pipeline, detecting & describing data-augmented images on the fly.
Excellent! Thx again for your prompt reply!
Hi @Skydes, if I may ask something related, why should points that are not initially detected as local maxima in max_mask = scores == max_pool(scores)
be added in max_mask = max_mask | (new_max_mask & (~supp_mask))
?:
https://github.com/magicleap/SuperGluePretrainedNetwork/blob/ddcf11f42e7e0732a0c4607648f9448ea8d73590/models/superpoint.py#L55-L62
I'm struggling to see the intuition behind this decision. I believe this is also a difference with respect to the Daniel's Superpoint implementation.
Thanks in advance! :)
The initialization of L56 is too conservative as it doesn't account for chains - consider a point A that suppresses B, which itself suppresses C. Daniel's original implementation would retain A and C if they are sufficiently far away from each other, while L56 would retain A only. By removing suppressed pixels supp_mask
in supp_scores
and propagating scores for a few iterations, we recover points like C. This is a very good approximation and much faster than the original iterative implementation.
Understood, thanks a lot @Skydes
Sorry to bother again @Skydes, I tried a little bit the code and I detected a possible undesired result.
This way of performing NMS can produce false positives. See for for instance this 1d example with a nms radius of 2:
In this case the false positive is 4
.
A more complete example with nms_radius = 4
:
[Code for reproduction]
import torch
import matplotlib.pyplot as plt
def simple_nms(scores, nms_radius: int):
""" Fast Non-maximum suppression to remove nearby points """
assert(nms_radius >= 0)
def max_pool(x):
return torch.nn.functional.max_pool2d(
x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius)
zeros = torch.zeros_like(scores)
max_mask = scores == max_pool(scores)
for _ in range(2):
supp_mask = max_pool(max_mask.float()) > 0
supp_scores = torch.where(supp_mask, zeros, scores)
new_max_mask = supp_scores == max_pool(supp_scores)
max_mask = max_mask | (new_max_mask & (~supp_mask))
return torch.where(max_mask, scores, zeros)
if __name__ == "__main__":
nms_radius = 4
scores = torch.tensor(
[[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]]
)
scores = torch.tensor(
[[[[.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0],
[.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0],
[.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0],
[.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0],
[.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1],
[.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0],
[.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0],
[.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0],
[.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0]]]]
)
nms = simple_nms(scores, nms_radius)
fig, axes = plt.subplots(ncols=3)
titles = ['scores', 'expected', 'actual result']
axes[0].imshow(
scores.numpy()[0,0],
cmap='jet',
vmin=0., vmax=0.9)
axes[1].imshow(
(scores == 0.9).float()[0,0].numpy(),
cmap='Greens',
vmin=0, vmax=1)
axes[2].imshow(
(nms > 0).float()[0,0].numpy(),
cmap='Greens',
vmin=0, vmax=1)
for axi, title in zip(axes, titles): axi.set(title=title)
fig.tight_layout()
I believe this issue can be corrected easily by checking with a nms_radius=1
if the new detected keypoints are local maxima i.e.:
def simple_nms(scores, nms_radius: int):
""" Fast Non-maximum suppression to remove nearby points """
assert(nms_radius >= 0)
def max_pool(x):
return torch.nn.functional.max_pool2d(
x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius)
# ❕ added function
def max_pool_r1(x):
return torch.nn.functional.max_pool2d(
x, kernel_size=3, stride=1, padding=1)
zeros = torch.zeros_like(scores)
max_mask = scores == max_pool(scores)
max_mask_r1 = scores == max_pool_r1(scores) # ❕ added line
for _ in range(2):
supp_mask = max_pool(max_mask.float()) > 0
supp_scores = torch.where(supp_mask, zeros, scores)
new_max_mask = supp_scores == max_pool(supp_scores)
max_mask = max_mask | (new_max_mask & (~supp_mask) & max_mask_r1) # ❕ modified line
return torch.where(max_mask, scores, zeros)
New result:
As a sidenote, I feel that this issue is not very likely to happen or very important (the proof is that SuperGlue works great :) ). Nevertheless this may help a bit (if deemed useful I can submit a PR).
On the other hand, I feel that it may not be sensible to introduce this change with the actual weights of SuperGlue. After this change, the detected keypoint patterns of Superpoint may change a bit. So, at least, it should be tested first.
Good catch, thank you for the detailed analysis and minimal example with code. This should ideally be fixed, but I indeed expect this to not make much difference. I'll run some evaluations when I find the time.