SuperGluePretrainedNetwork DO NMS before or after keypoints are dectected, what's the differences?

Hi, I notice that there are slightly differences between Magicleap SuperPoint implementation by Daniel and this repo. In this repo, nms (defined by a max_pooling kernel) is performed on heatmap/scores matrix (https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/models/superpoint.py#L167), but in Daniel's version, nms is perform on keypoints locations(https://github.com/magicleap/SuperPointPretrainedNetwork/blob/master/demo_superpoint.py#L258). I can't figure out why you make this revision and what's the differences of the final keypoint localtion? Thanks for your reply:) @Skydes

Apr 01 '22 06:04 Vincentqyw

Hi @Vincentqyw, good question! The two approaches produce nearly-identical results but the implementation of this repo:

is much faster: it can run on GPU (it works on a torch.Tensor) and has near-constant time (instead of iterating over the keypoints)
can be batched if images have the same size.

The whole forward pass of SuperPoint is actually batched, so we used this implementation of SuperPoint in our training pipeline, detecting & describing data-augmented images on the fly.

Apr 01 '22 07:04 sarlinpe

Excellent! Thx again for your prompt reply!

Apr 01 '22 08:04 Vincentqyw

Hi @Skydes, if I may ask something related, why should points that are not initially detected as local maxima in max_mask = scores == max_pool(scores) be added in max_mask = max_mask | (new_max_mask & (~supp_mask))?: https://github.com/magicleap/SuperGluePretrainedNetwork/blob/ddcf11f42e7e0732a0c4607648f9448ea8d73590/models/superpoint.py#L55-L62 I'm struggling to see the intuition behind this decision. I believe this is also a difference with respect to the Daniel's Superpoint implementation. Thanks in advance! :)

Jun 26 '22 14:06 javierttgg

The initialization of L56 is too conservative as it doesn't account for chains - consider a point A that suppresses B, which itself suppresses C. Daniel's original implementation would retain A and C if they are sufficiently far away from each other, while L56 would retain A only. By removing suppressed pixels supp_mask in supp_scores and propagating scores for a few iterations, we recover points like C. This is a very good approximation and much faster than the original iterative implementation.

Jul 26 '22 07:07 sarlinpe

Understood, thanks a lot @Skydes

Jul 26 '22 14:07 javierttgg

Sorry to bother again @Skydes, I tried a little bit the code and I detected a possible undesired result.

This way of performing NMS can produce false positives. See for for instance this 1d example with a nms radius of 2:

In this case the false positive is 4.

A more complete example with nms_radius = 4:

[Code for reproduction]

import torch
import matplotlib.pyplot as plt


def simple_nms(scores, nms_radius: int):
    """ Fast Non-maximum suppression to remove nearby points """
    assert(nms_radius >= 0)

    def max_pool(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius)

    zeros = torch.zeros_like(scores)
    max_mask = scores == max_pool(scores)
    for _ in range(2):
        supp_mask = max_pool(max_mask.float()) > 0
        supp_scores = torch.where(supp_mask, zeros, scores)
        new_max_mask = supp_scores == max_pool(supp_scores)
        max_mask = max_mask | (new_max_mask & (~supp_mask))
    return torch.where(max_mask, scores, zeros)


if __name__ == "__main__":
    
    nms_radius = 4
    scores = torch.tensor(
        [[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]]
        )
    
    scores = torch.tensor(
        [[[[.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0],
           [.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0],
           [.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0],
           [.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0],
           [.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1],
           [.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0],
           [.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0],
           [.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0],
           [.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0]]]]
        )
    
    nms = simple_nms(scores, nms_radius)
    
    fig, axes = plt.subplots(ncols=3)
    titles = ['scores', 'expected', 'actual result']
    
    axes[0].imshow(
        scores.numpy()[0,0], 
        cmap='jet', 
        vmin=0., vmax=0.9)
    
    axes[1].imshow(
        (scores == 0.9).float()[0,0].numpy(), 
        cmap='Greens', 
        vmin=0, vmax=1)
    
    axes[2].imshow(
        (nms > 0).float()[0,0].numpy(), 
        cmap='Greens', 
        vmin=0, vmax=1)
    
    for axi, title in zip(axes, titles): axi.set(title=title)
    fig.tight_layout()

I believe this issue can be corrected easily by checking with a nms_radius=1 if the new detected keypoints are local maxima i.e.:

def simple_nms(scores, nms_radius: int):
    """ Fast Non-maximum suppression to remove nearby points """
    assert(nms_radius >= 0)

    def max_pool(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius)
    
    # ❕ added function  
    def max_pool_r1(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=3, stride=1, padding=1)

    zeros = torch.zeros_like(scores)
    max_mask = scores == max_pool(scores)
    max_mask_r1 = scores == max_pool_r1(scores) # ❕ added line
    for _ in range(2):
        supp_mask = max_pool(max_mask.float()) > 0
        supp_scores = torch.where(supp_mask, zeros, scores)
        new_max_mask = supp_scores == max_pool(supp_scores)
        max_mask = max_mask | (new_max_mask & (~supp_mask) & max_mask_r1) # ❕ modified line
    return torch.where(max_mask, scores, zeros)

New result:

As a sidenote, I feel that this issue is not very likely to happen or very important (the proof is that SuperGlue works great :) ). Nevertheless this may help a bit (if deemed useful I can submit a PR).

On the other hand, I feel that it may not be sensible to introduce this change with the actual weights of SuperGlue. After this change, the detected keypoint patterns of Superpoint may change a bit. So, at least, it should be tested first.

Jul 29 '22 19:07 javierttgg

Good catch, thank you for the detailed analysis and minimal example with code. This should ideally be fixed, but I indeed expect this to not make much difference. I'll run some evaluations when I find the time.

Aug 18 '22 07:08 sarlinpe

SuperGluePretrainedNetwork SuperGluePretrainedNetwork copied to clipboard

DO NMS before or after keypoints are dectected, what's the differences?

SuperGluePretrainedNetwork
SuperGluePretrainedNetwork copied to clipboard