SuperGluePretrainedNetwork icon indicating copy to clipboard operation
SuperGluePretrainedNetwork copied to clipboard

DO NMS before or after keypoints are dectected, what's the differences?

Open Vincentqyw opened this issue 2 years ago • 7 comments

Hi, I notice that there are slightly differences between Magicleap SuperPoint implementation by Daniel and this repo. In this repo, nms (defined by a max_pooling kernel) is performed on heatmap/scores matrix (https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/models/superpoint.py#L167), but in Daniel's version, nms is perform on keypoints locations(https://github.com/magicleap/SuperPointPretrainedNetwork/blob/master/demo_superpoint.py#L258). I can't figure out why you make this revision and what's the differences of the final keypoint localtion? Thanks for your reply:) @Skydes

Vincentqyw avatar Apr 01 '22 06:04 Vincentqyw

Hi @Vincentqyw, good question! The two approaches produce nearly-identical results but the implementation of this repo:

  • is much faster: it can run on GPU (it works on a torch.Tensor) and has near-constant time (instead of iterating over the keypoints)
  • can be batched if images have the same size.

The whole forward pass of SuperPoint is actually batched, so we used this implementation of SuperPoint in our training pipeline, detecting & describing data-augmented images on the fly.

sarlinpe avatar Apr 01 '22 07:04 sarlinpe

Excellent! Thx again for your prompt reply!

Vincentqyw avatar Apr 01 '22 08:04 Vincentqyw

Hi @Skydes, if I may ask something related, why should points that are not initially detected as local maxima in max_mask = scores == max_pool(scores) be added in max_mask = max_mask | (new_max_mask & (~supp_mask))?: https://github.com/magicleap/SuperGluePretrainedNetwork/blob/ddcf11f42e7e0732a0c4607648f9448ea8d73590/models/superpoint.py#L55-L62 I'm struggling to see the intuition behind this decision. I believe this is also a difference with respect to the Daniel's Superpoint implementation. Thanks in advance! :)

javierttgg avatar Jun 26 '22 14:06 javierttgg

The initialization of L56 is too conservative as it doesn't account for chains - consider a point A that suppresses B, which itself suppresses C. Daniel's original implementation would retain A and C if they are sufficiently far away from each other, while L56 would retain A only. By removing suppressed pixels supp_mask in supp_scores and propagating scores for a few iterations, we recover points like C. This is a very good approximation and much faster than the original iterative implementation.

sarlinpe avatar Jul 26 '22 07:07 sarlinpe

Understood, thanks a lot @Skydes

javierttgg avatar Jul 26 '22 14:07 javierttgg

Sorry to bother again @Skydes, I tried a little bit the code and I detected a possible undesired result.

This way of performing NMS can produce false positives. See for for instance this 1d example with a nms radius of 2:

image

In this case the false positive is 4.

A more complete example with nms_radius = 4:

image

[Code for reproduction]
import torch
import matplotlib.pyplot as plt


def simple_nms(scores, nms_radius: int):
    """ Fast Non-maximum suppression to remove nearby points """
    assert(nms_radius >= 0)

    def max_pool(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius)

    zeros = torch.zeros_like(scores)
    max_mask = scores == max_pool(scores)
    for _ in range(2):
        supp_mask = max_pool(max_mask.float()) > 0
        supp_scores = torch.where(supp_mask, zeros, scores)
        new_max_mask = supp_scores == max_pool(supp_scores)
        max_mask = max_mask | (new_max_mask & (~supp_mask))
    return torch.where(max_mask, scores, zeros)


if __name__ == "__main__":
    
    nms_radius = 4
    scores = torch.tensor(
        [[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]]
        )
    
    scores = torch.tensor(
        [[[[.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0],
           [.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0],
           [.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0],
           [.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0],
           [.7, .8, .9, .8, .7, .6, .5, .4, .3, .2, .1],
           [.6, .7, .8, .7, .6, .5, .4, .3, .2, .1, .0],
           [.5, .6, .7, .6, .5, .4, .3, .2, .1, .0, .0],
           [.4, .5, .6, .5, .4, .3, .2, .1, .0, .0, .0],
           [.3, .4, .5, .4, .3, .2, .1, .0, .0, .0, .0]]]]
        )
    
    nms = simple_nms(scores, nms_radius)
    
    fig, axes = plt.subplots(ncols=3)
    titles = ['scores', 'expected', 'actual result']
    
    axes[0].imshow(
        scores.numpy()[0,0], 
        cmap='jet', 
        vmin=0., vmax=0.9)
    
    axes[1].imshow(
        (scores == 0.9).float()[0,0].numpy(), 
        cmap='Greens', 
        vmin=0, vmax=1)
    
    axes[2].imshow(
        (nms > 0).float()[0,0].numpy(), 
        cmap='Greens', 
        vmin=0, vmax=1)
    
    for axi, title in zip(axes, titles): axi.set(title=title)
    fig.tight_layout()

I believe this issue can be corrected easily by checking with a nms_radius=1 if the new detected keypoints are local maxima i.e.:

def simple_nms(scores, nms_radius: int):
    """ Fast Non-maximum suppression to remove nearby points """
    assert(nms_radius >= 0)

    def max_pool(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=nms_radius*2+1, stride=1, padding=nms_radius)
    
    # ❕ added function  
    def max_pool_r1(x):
        return torch.nn.functional.max_pool2d(
            x, kernel_size=3, stride=1, padding=1)

    zeros = torch.zeros_like(scores)
    max_mask = scores == max_pool(scores)
    max_mask_r1 = scores == max_pool_r1(scores) # ❕ added line
    for _ in range(2):
        supp_mask = max_pool(max_mask.float()) > 0
        supp_scores = torch.where(supp_mask, zeros, scores)
        new_max_mask = supp_scores == max_pool(supp_scores)
        max_mask = max_mask | (new_max_mask & (~supp_mask) & max_mask_r1) # ❕ modified line
    return torch.where(max_mask, scores, zeros)

New result:

image

As a sidenote, I feel that this issue is not very likely to happen or very important (the proof is that SuperGlue works great :) ). Nevertheless this may help a bit (if deemed useful I can submit a PR).

On the other hand, I feel that it may not be sensible to introduce this change with the actual weights of SuperGlue. After this change, the detected keypoint patterns of Superpoint may change a bit. So, at least, it should be tested first.

javierttgg avatar Jul 29 '22 19:07 javierttgg

Good catch, thank you for the detailed analysis and minimal example with code. This should ideally be fixed, but I indeed expect this to not make much difference. I'll run some evaluations when I find the time.

sarlinpe avatar Aug 18 '22 07:08 sarlinpe