deep-image-matching Error when using Roma

Hi, and thanks for this code! I have installed in conda and am running the command:

python main.py --dir assets/test --pipeline roma

Features are extracted, but the matcher fails with:

2024-10-04 10:33:28 | [INFO    ] Features extracted!
2024-10-04 10:33:28 | [INFO    ] Matching features with roma...
2024-10-04 10:33:28 | [INFO    ] roma configuration:
{'name': 'roma', 'pretrained': 'outdoor'}
2024-10-04 10:33:28 | [INFO    ] Matching features...
2024-10-04 10:33:28 | [INFO    ]
  0%|                                                                                            | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\B\deep-image-matching\main.py", line 49, in <module>
    match_path = img_matching.match_pairs(feature_path)
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\image_matching.py", line 427, in match_pairs
    self._matcher.match(
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\matchers\roma.py", line 89, in match
    matches = self._match_pairs(self._feature_path, img0, img1)
  File "F:\Conda\envs\deep-image-matching\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\matchers\roma.py", line 165, in _match_pairs
    warp, certainty = self.matcher.match(
  File "F:\Conda\envs\deep-image-matching\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\thirdparty\RoMa\roma\models\matcher.py", line 709, in match
    im_A, im_B = test_transform((im_A, im_B))
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\thirdparty\RoMa\roma\utils\utils.py", line 292, in __call__
    im_tuple = t(im_tuple)
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\thirdparty\RoMa\roma\utils\utils.py", line 209, in __call__
    return [self.to_tensor(im) for im in im_tuple]
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\thirdparty\RoMa\roma\utils\utils.py", line 209, in <listcomp>
    return [self.to_tensor(im) for im in im_tuple]
  File "C:\Users\B\deep-image-matching\src\deep_image_matching\thirdparty\RoMa\roma\utils\utils.py", line 194, in __call__
    im = np.array(im, dtype=np.float32).transpose((2, 0, 1))
ValueError: axes don't match array

I am attaching the config here as well. What can I do to resolve this? Thanks!

config.json

Oct 04 '24 09:10 antithing

Hi, thanks for reporting, are you on dev branch? Could you just try to run this basic example to see if you have the same issue: >python ./main.py -d .\assets\example_cyprus -p roma --skip_reconstruction --force

Oct 04 '24 12:10 lcmrl

I installed via pip, inside conda. Will check with that command and report back. Thanks!

Oct 04 '24 12:10 antithing

Hi, I think I have the issue... using the cypruss dataset, things look okay, using a custom dataset where the images are of different sizes, I get the error.

Could the different sizes be the problem here? How can I get around it?

Oct 07 '24 14:10 antithing

Hi, what is the size of the images?

Oct 07 '24 14:10 lcmrl

it is a 3 image dataset.

2 x 1280 x 800 1 x 1920 x 1080

Oct 07 '24 15:10 antithing

more testing... actually even if i remove the 1920x1080 image, i see the same error. When using Loftr, it runs perfectly. This is specific to a roma pipeline

Oct 07 '24 15:10 antithing

Are your images grayscale or RGB?

Oct 07 '24 15:10 lcmrl

greyscale

Oct 07 '24 16:10 antithing

Could you try with any RGB images that you have? Maybe this is the issue. In that case we could convert grayscale to RGB and this should solve the problem

Oct 07 '24 16:10 lcmrl

That resolved it. Thank you!

I have a workflow question if you have a minute...

My dataset is 3 images, two are from a stereo camera, and one is from a completely different camera.

As I know the extrinsics for the stereo camera, I am doing the following steps:

Match the stereo pair with Roma Create an initial model with colmap run rigbundleadjuster to constrain the model from the stereo intrinsics and give world scale.

Now I need to add the third image to the dataset and bundle adjust it. What is the best way to do this?

-- the reason i am going to all this trouble is that when throwing all images in at once, the resulting camera poses are incorrect (I am more interested in camera poses than points for this case)

Thanks!

Oct 08 '24 18:10 antithing

Hi, have you tried with superpoint instead of Roma? You know the relative pose between the two stereo cameras, but do you know reliable intrinsics and distortions parameters for the three cameras?

Oct 09 '24 08:10 lcmrl

Hi, Superpoint seems to initialise easier with the dataset and looks good when running just the stereo pair. Running all three images at once still gives incorrect results. I have intrinsics and extrinsics for the stereo pair. The goal here is to find the relative transform to the third camera.

Oct 09 '24 09:10 antithing

When you say incorrect do you mean completely incorrect with a certain error but not so big? In general for these kind of procedures a good solution is to move the triplet of cameras around a scene with good texture, so the final estimate will be reliable

Oct 09 '24 09:10 lcmrl

Thank you! To link images with their camera, is the best way to create a yaml config? Or is there a each-camera-has-a-image-directory option?

Oct 09 '24 15:10 antithing

There are different ways to go, the easier is to put every camera in a separate subfolder, then when you run rig_bundle_adjuster you have to pass a config file like the following. For instance cam0 is the name of the first subfolder. The two lines cam_from_rig_rotation and cam_from_rig_translation are optional. If you do not put them, COLMAP will try to estimate the relative poses. In your case this is the scenario for the third camera.

[
    {
      "ref_camera_id": 1,
      "cameras":
      [
        {
            "camera_id": 1,
            "image_prefix": "cam0",
            "cam_from_rig_rotation": [1, 0, 0, 0],
            "cam_from_rig_translation": [0, 0, 0]
        },
        {
            "camera_id": 2,
            "image_prefix": "cam1",
            "cam_from_rig_rotation": [1, 0, 0, 0],
            "cam_from_rig_translation": [0.120, 0, 0]
        }
      ]
    }
  ]

Oct 09 '24 16:10 lcmrl

Hi, I close the issue, feel free to reopen if you find any other issue or feel free to collaborate to the project!

Oct 11 '24 13:10 lcmrl