LightGlue icon indicating copy to clipboard operation
LightGlue copied to clipboard

Batch support

Open marisancans opened this issue 1 year ago • 3 comments

I suppose batch support is not yet implemented? The inference speed is nice but it needs to be batched in order to use 100% GPU

marisancans avatar Jun 28 '23 09:06 marisancans

Hey. Batch support is there for normal inference, but not for adaptive depth/ width. However, at least for adaptive width it should be easy to add batch support. I can have a look into it the next days.

Phil26AT avatar Jun 28 '23 15:06 Phil26AT

Using match_pair function with input shapes [1, 3, 480, 640] where 1 is batch size crashes the code. Maybe match_pair is not the correct function to use?

preds = match_pair(self.extractor, self.matcher, prev_frames_t, frames_t)    

Exception has occurred: RuntimeError
Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 1, 3, 480, 640]

marisancans avatar Jun 29 '23 07:06 marisancans

Any updates on this?

marisancans avatar Jul 06 '23 08:07 marisancans

Hi @marisancans,

match_pair does not support batches, but the forward passes of DISK, SuperPoint and LightGlue support batched inputs. However, there are some problems that need to be solved:

  • If the input images have different shapes, square padding or cropping is required for DISK/SuperPoint. Unfortunately, both DISK and SuperPoint tend to yield plenty of border artifacts, which could be solved by forwarding a padding mask.
  • To batch keypoints/descriptors, you need to ensure that the exact same number of keypoints are detected in each image. You can try reducing the detection_threshold s.t. the extractors always find max_num_keypoints features, or pad the keypoints/descriptors with random numbers.

Assuming you have two sets of batched images, a simple pipeline could look like this (After PR #22 is merged):

extractor = SuperPoint(max_num_keypoints=1024, detection_threshold=0.0).eval().cuda()
matcher = LightGlue().eval().cuda()

feats0 = extractor({'image': im0})
feats1 = extractor({'image': im1})
matches01 = matcher({'image0': {**feats0, 'image': image0}, 'image1': {**feats1, 'image': image1}})

Phil26AT avatar Jul 10 '23 17:07 Phil26AT

If detection_threshold is zero, this would then give the same number of keypoints each time, no matter what image pairs are given, right? And then the filtration could be done later on with matches01? This kind of solution would defiliently work fine to me. Looking forward the merge Thank you for responding, new publication authors usually ignore repo issues :D

marisancans avatar Jul 11 '23 07:07 marisancans

Yes, setting detection_threshold=0.0 should yield the same number of keypoints every time, but there might be some corner cases where less than max_num_keypoints have a detection score >0. With the filter_threshold parameter you can then set a confidence threshold on the correspondences. This is also the setup we used for training LightGlue. Thanks for reporting this issue, very glad to see people using LightGlue :)

Phil26AT avatar Jul 11 '23 08:07 Phil26AT

Cant quite get it to work. Im getting index errors My inputs to matcher are: data = {'image0': {**feats0, 'image': image0}, 'image1': {**feats1, 'image': image1}} matches01 = matcher(data) 1024 is the feature count, 4 is batch size

Where data looks like:

{
  "image0": {
    "keypoints": (4, 1024, 2),
    "keypoint_scores": (4, 1024),
    "descriptors": (4, 1024, 256),
    "image": (4, 3, 512, 512)
  },
"image1": {
  "keypoints": (4, 1024, 2),
  "keypoint_scores": (4, 1024),
  "descriptors": (4, 1024, 256),
  "image": (4, 3, 512, 512)
 }
}

Exception has occurred: IndexError
The shape of the mask [4, 1024] at index 0 does not match the shape of the indexed tensor [1, 1024] at index 0
  File "/home/ma/src/nerfstudio_preprocess/LightGlue/lightglue/lightglue.py", line 400, in _forward
    ind0, ind1 = ind0[mask0][None], ind1[mask1][None]
  File "/home/ma/src/nerfstudio_preprocess/LightGlue/lightglue/lightglue.py", line 343, in forward
    return self._forward(data)
  File "/home/ma/src/nerfstudio_preprocess/lightning_system.py", line 27, in match_pair
    matches01 = self.matcher(data)
  File "/home/ma/src/nerfstudio_preprocess/lightning_system.py", line 70, in predict_step
    r = self.match_pair(frames_t, still_batch_t, feats0, feats1[0])
  File "/home/ma/src/nerfstudio_preprocess/gui/callback.py", line 42, in on_predict_batch_end
    for res in output_iterator:
  File "/home/ma/src/nerfstudio_preprocess/walker.py", line 220, in main
    inferencer.predict(ls, dataloader)
  File "/home/ma/src/nerfstudio_preprocess/walker.py", line 239, in <module>
    main()
IndexError: The shape of the mask [4, 1024] at index 0 does not match the shape of the indexed tensor [1, 1024] at index 0

Here are the shapes ar line 400 in lightglue.py:

ind0.shape
torch.Size([1, 1024])
ind1.shape
torch.Size([1, 1024])
mask0.shape
torch.Size([4, 1024])
mask1.shape
torch.Size([4, 1024])

Setting filter_threshold=0.0 to LightGlue doesnt fix this too

marisancans avatar Jul 12 '23 09:07 marisancans

You can reproduce this by trying:

 data = {
            "image0": {
                "keypoints": torch.rand(4, 1024, 2).cuda(),
                "descriptors": torch.rand(4, 1024, 256).cuda(),
                "image": torch.rand(4, 3, 512, 512).cuda(),
            },
            "image1": {
                "keypoints": torch.rand(4, 1024, 2).cuda(),
                "descriptors": torch.rand(4, 1024, 256).cuda(),
                "image": torch.rand(4, 3, 512, 512).cuda(),
            }
        }
matches01 = matcher(data)

Or this:

from lightglue import LightGlue, SuperPoint, DISK
import torch
extractor = SuperPoint(max_num_keypoints=1024, detection_threshold=0.0).eval().cuda()
matcher = LightGlue().eval().cuda()

im0 = torch.rand(4, 3, 512, 512).cuda()
im1 = torch.rand(4, 3, 512, 512).cuda()
feats0 = extractor({'image': im0})
feats1 = extractor({'image': im1})
matches01 = matcher({'image0': {**feats0, 'image': im0}, 'image1': {**feats1, 'image': im1}})

marisancans avatar Jul 12 '23 09:07 marisancans

I can add batch support to ind0 and ind1 like this:

ind0 = torch.stack([torch.arange(0, m).to(device=kpts0.device) for x in range(b)])
ind1 = torch.stack([torch.arange(0, n).to(device=kpts0.device) for x in range(b)]) 

Then I get the next error:

Exception has occurred: IndexError
The shape of the mask [4, 1024] at index 0 does not match the shape of the indexed tensor [2, 4, 1, 1024, 64] at index 2
  File "/home/ma/src/nerfstudio_preprocess/LightGlue/lightglue/lightglue.py", line 404, in _forward
    encoding0 = encoding0[:, :, mask0][:, None]
  File "/home/ma/src/nerfstudio_preprocess/LightGlue/lightglue/lightglue.py", line 343, in forward
    return self._forward(data)
  File "/home/ma/src/nerfstudio_preprocess/lightning_system.py", line 41, in match_pair
    matches01 = self.matcher(data)
  File "/home/ma/src/nerfstudio_preprocess/lightning_system.py", line 84, in predict_step
    r = self.match_pair(frames_t, still_batch_t, feats0, feats1[0])
  File "/home/ma/src/nerfstudio_preprocess/gui/callback.py", line 42, in on_predict_batch_end
    for res in output_iterator:
  File "/home/ma/src/nerfstudio_preprocess/walker.py", line 220, in main
    inferencer.predict(ls, dataloader)
  File "/home/ma/src/nerfstudio_preprocess/walker.py", line 239, in <module>
    main()
IndexError: The shape of the mask [4, 1024] at index 0 does not match the shape of the indexed tensor [2, 4, 1, 1024, 64] at index 2

marisancans avatar Jul 12 '23 10:07 marisancans

Hi @marisancans,

with the latest merge we updated the defaults to perform early stopping / pruning by default. You have to manually switch off adaptive depth and width, which lack batch-support. Then your code should work again :)

Phil26AT avatar Jul 12 '23 11:07 Phil26AT

Im not sure what I did, but I managed to do it without disabling anything You can check if it is correct, because I have no idea what I did there to be honest. Here are the changes I did:

Added batch support to ind0 and ind1 image

Also here image

And here: image

Now whats left is postprocessing

marisancans avatar Jul 12 '23 12:07 marisancans

Your code might run, but I am quite certain it will not yield the expected outcome. While early stopping (depth_confidence) might work (it would compute the confidence over all image pairs in your batch, and stop if the overall confidence is reached), adaptive width cannot work this way, since we resize descriptors/encodings. I suggest validating your results by visualizing them, just like in the demo notebook.

Phil26AT avatar Jul 12 '23 13:07 Phil26AT

You are right, the results are wierd and not a lot of points found. For now it looks like I got it working in batch mode, performance increase is huge. Thanks for support!

marisancans avatar Jul 12 '23 13:07 marisancans

For future reference in case someone stumbles upon the same problem:

This is how I am doing batching. Get features

# Where im0 and im1 are image pairs (B, C, H, W)
extractor = SuperPoint(max_num_keypoints=1024, detection_threshold=0.0).eval().cuda()
feats0 = extractor({ "image":  im0 })
feats0 = extractor({ "image":  im1 })

Use matcher custom_match_pair(im0,im1, feats0, feats1)

Do filtration of points, note that I havent yet figured out how to do scaling (but i think its possible, currently both image pairs are the same sizes)


def custom_match_pair(self, image0, image1, feats0, feats1):        
        data = {'image0': {**feats0, 'image': image0}, 'image1': {**feats1, 'image': image1}}

        matches01 = self.matcher(data)
        matches0, mscores0 = matches01['matches0'], matches01['matching_scores0']

        pred = {**{k+'0': v for k, v in feats0.items()}, 
                **{k+'1': v for k, v in feats1.items()},
                **matches01}
        # Maybe needed if you get funky nvidia errors
        # pred = {k: v.detach() if isinstance(v, torch.Tensor) else v for k, v in matches01.items()}


        # if scales0 is not None:
        #     pred['keypoints0'] = (pred['keypoints0'] + 0.5) / scales0[None] - 0.5
        # if scales1 is not None:
        #     pred['keypoints1'] = (pred['keypoints1'] + 0.5) / scales1[None] - 0.5
        # del feats0, feats1
        # torch.cuda.empty_cache()

        # create match indices
        # matches0, mscores0 = pred['matches0'], pred['matching_scores0'] # matching_scores

        m_kpts0_all = []
        m_kpts1_all = []
        matching_scores_all = []

        valid = matches0 > -1

        for v, m, kp0, kp1, scores in zip(valid, matches0, pred['keypoints0'], pred['keypoints1'], mscores0):
            matches = torch.stack([torch.where(v)[0], m[v]], -1)

            m_kpts0, m_kpts1 = kp0[matches[..., 0]], kp1[matches[..., 1]]

            matching_scores = scores[v]

            m_kpts0_all.append(m_kpts0)
            m_kpts1_all.append(m_kpts1)
            matching_scores_all.append(matching_scores)


        return m_kpts0_all, m_kpts1_all, matching_scores_all

marisancans avatar Jul 12 '23 13:07 marisancans

Hi @marisancans,

match_pair does not support batches, but the forward passes of DISK, SuperPoint and LightGlue support batched inputs. However, there are some problems that need to be solved:

  • If the input images have different shapes, square padding or cropping is required for DISK/SuperPoint. Unfortunately, both DISK and SuperPoint tend to yield plenty of border artifacts, which could be solved by forwarding a padding mask.
  • To batch keypoints/descriptors, you need to ensure that the exact same number of keypoints are detected in each image. You can try reducing the s.t. the extractors always find features, or pad the keypoints/descriptors with random numbers.detection_threshold``max_num_keypoints

Assuming you have two sets of batched images, a simple pipeline could look like this (After PR #22 is merged):

extractor = SuperPoint(max_num_keypoints=1024, detection_threshold=0.0).eval().cuda()
matcher = LightGlue().eval().cuda()

feats0 = extractor({'image': im0})
feats1 = extractor({'image': im1})
matches01 = matcher({'image0': {**feats0, 'image': image0}, 'image1': {**feats1, 'image': image1}})

hi i do pad the keypoints with 0, but i got a low accuracy like 5%. is here a better way to solve it. because of my dataset_image size is quite small, i set the max_num_keypoints = 128 to avoid the problem

mmmmmjaai avatar Oct 08 '23 08:10 mmmmmjaai