SplaTAM icon indicating copy to clipboard operation
SplaTAM copied to clipboard

Supporting No Depth Input - RGB SLAM

Open StarsTesla opened this issue 1 year ago • 10 comments

Hi there, at first thx for your great work, I install the env and here is my sitution:

I cannot use iPhone to connect the server cause it's not on same WIFI but on same local network. So I decide to capture the dataset offline by nerfcapture, I can get a dir of images and transform.json

Then I tried to change the dataset to fit the python scripts/splatam.py configs/iphone/splatam.py For instance, I change the dir of images/0 to rgb/0.png, then keep others same.

But I directly run the ``python scripts/splatam.py configs/iphone/splatam.py`, I got a error of :

  File "/root/anaconda3/envs/3dgs/lib/python3.9/site-packages/imageio/core/imopen.py", line 113, in imopen
    request = Request(uri, io_mode, format_hint=format_hint, extension=extension)
  File "/root/anaconda3/envs/3dgs/lib/python3.9/site-packages/imageio/core/request.py", line 247, in __init__
    self._parse_uri(uri)
  File "/root/anaconda3/envs/3dgs/lib/python3.9/site-packages/imageio/core/request.py", line 407, in _parse_uri
    raise FileNotFoundError("No such file: '%s'" % fn)
FileNotFoundError: No such file: '/home/xingchenzhou/code/git/SplaTAM/experiments/iPhone_Captures/offline_demo/depth/0.png'

This should not happend, cause I saw the dataset convert python script, the depth is an option not a must.

I also tried to run other code, I get:

FileNotFoundError: [Errno 2] No such file or directory: '././experiments/iPhone_Captures/offline_demo/SplaTAM_iPhone/params.npz'

So, any idea?

StarsTesla avatar Dec 07 '23 03:12 StarsTesla

Hi, Thanks for trying out the code!

SplaTAM requires depth input for running SLAM & reconstruction. Our dataloaders, by default, expect both an rgb and depth folder. We haven't tested the offline setting in the NeRFCapture App. We have only tested our scripts, which interface with NeRFCapture in Online mode.

I just checked capturing an offline dataset with NeRFCapture. It looks like both the RGB and depth pngs are saved under the images folder. You would need to move the .depth.png images to a depth folder and rename them to .png in addition to the renaming of images folder. This should be a pretty simple script to write.

Nik-V9 avatar Dec 07 '23 04:12 Nik-V9

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here https://github.com/jc211/NeRFCapture/issues/10#issuecomment-1701908651. I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

If I understood this comment correctly:

  • the online mode reads (uint32 or float32?) data from the Nerfcapture app
  • then scales that by some number (1/10?) and saves to PNG
  • the scale of the original depth image (before saving to PNG) is assumed to be 65535 units = 1m
  • so the PNG depth scale here is 6553.5 units = 1m
  • other datasets are configured to use other depth scaling, more typically 1000 units = 1m, i.e., depth in millimeters

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

oseiskar avatar Dec 07 '23 10:12 oseiskar

Hi, Thanks for trying out the code!

SplaTAM requires depth input for running SLAM & reconstruction. Our dataloaders, by default, expect both an rgb and depth folder. We haven't tested the offline setting in the NeRFCapture App. We have only tested our scripts, which interface with NeRFCapture in Online mode.

I just checked capturing an offline dataset with NeRFCapture. It looks like both the RGB and depth pngs are saved under the images folder. You would need to move the .depth.png images to a depth folder and rename them to .png in addition to the renaming of images folder. This should be a pretty simple script to write.

Hello, I am running a program using WSL and I don't know how to keep WSL and my phone on the same network segment. Therefore, I used nerfcapture for offline data collection. However, after the collection was completed, I found that the folder on my phone only contains color images and transformer. json, and does not include depth maps. My phone is an Apple 14, is it unable to capture depth maps?

LemonSoda-RPG avatar Dec 07 '23 12:12 LemonSoda-RPG

@Nik-V9 I certainly check the images dir, there is only rgb image. Does the data need collect by iPhone with lidar? Maybe here could be improved to use something like MiDAS or MVSnet to get the depth?

StarsTesla avatar Dec 07 '23 14:12 StarsTesla

Hi, Thanks for trying out the code! SplaTAM requires depth input for running SLAM & reconstruction. Our dataloaders, by default, expect both an rgb and depth folder. We haven't tested the offline setting in the NeRFCapture App. We have only tested our scripts, which interface with NeRFCapture in Online mode. I just checked capturing an offline dataset with NeRFCapture. It looks like both the RGB and depth pngs are saved under the images folder. You would need to move the .depth.png images to a depth folder and rename them to .png in addition to the renaming of images folder. This should be a pretty simple script to write.

Hello, I am running a program using WSL and I don't know how to keep WSL and my phone on the same network segment. Therefore, I used nerfcapture for offline data collection. However, after the collection was completed, I found that the folder on my phone only contains color images and transformer. json, and does not include depth maps. My phone is an Apple 14, is it unable to capture depth maps?

Hi, I also faced the same issue and found only Pro support the LiDAR image

H-tr avatar Dec 07 '23 16:12 H-tr

@Nik-V9 I certainly check the images dir, there is only rgb image. Does the data need collect by iPhone with lidar? Maybe here could be improved to use something like MiDAS or MVSnet to get the depth?

Yes, you need a LiDAR-equipped iPhone for the demo.

Using a depth estimation network would make the method up to scale (not metric) since monocular depth wouldn't have scale. Your camera tracking performance would be influenced by the accuracy and multi-view consistency of the depth estimation network. An RGB-only SLAM method using 3D Gaussians is currently future research and might be one of the things we might consider.

Nik-V9 avatar Dec 07 '23 22:12 Nik-V9

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here jc211/NeRFCapture#10 (comment). I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

Yes, this is correct. Looks like the offline mode is broken. So far, we have only used the online mode.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

This looks like a cool app.

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

Yes! We need depth and intrinsics in addition to RGB for SLAM. The depth scale doesn't specifically have to be 6553.5 (as long as the pixel intensity to meter scaling is known). That's what our iPhone dataloader is currently hardcoded to: https://github.com/spla-tam/SplaTAM/blob/9e74e998356e97fca060330b854b921e674c98e6/datasets/gradslam_datasets/nerfcapture.py#L49

Nik-V9 avatar Dec 07 '23 23:12 Nik-V9

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here jc211/NeRFCapture#10 (comment). I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

Yes, this is correct. Looks like the offline mode is broken. So far, we have only used the online mode.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

This looks like a cool app.

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

Yes! We need depth and intrinsics in addition to RGB for SLAM. The depth scale doesn't specifically have to be 6553.5 (as long as the pixel intensity to meter scaling is known). That's what our iPhone dataloader is currently hardcoded to:

https://github.com/spla-tam/SplaTAM/blob/9e74e998356e97fca060330b854b921e674c98e6/datasets/gradslam_datasets/nerfcapture.py#L49

where does the 6553.5 number come from? I'm also trying to get this working, I see you use a depth_scale of 10 and this magic number of 6553.5 but I don't fully understand. What is encoded into the depth image, to see the actual metric value, I would need to divide by the 6553.5 and multiply by 10?

pablovela5620 avatar Dec 15 '23 19:12 pablovela5620

Hi @pablovela5620, the 6553.5 is the scaling factor for the depth png image. When you load the depth image, you need to divide the pixel values by this number to get metric depth. By default, the iPhone depth image has a pixel intensity of 65535, corresponding to 1 meter. When we save the depth image, we divide this by 10 and save the depth image.

Nik-V9 avatar Dec 26 '23 12:12 Nik-V9

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here jc211/NeRFCapture#10 (comment). I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

Yes, this is correct. Looks like the offline mode is broken. So far, we have only used the online mode.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

This looks like a cool app.

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

Yes! We need depth and intrinsics in addition to RGB for SLAM. The depth scale doesn't specifically have to be 6553.5 (as long as the pixel intensity to meter scaling is known). That's what our iPhone dataloader is currently hardcoded to: https://github.com/spla-tam/SplaTAM/blob/9e74e998356e97fca060330b854b921e674c98e6/datasets/gradslam_datasets/nerfcapture.py#L49

where does the 6553.5 number come from? I'm also trying to get this working, I see you use a depth_scale of 10 and this magic number of 6553.5 but I don't fully understand. What is encoded into the depth image, to see the actual metric value, I would need to divide by the 6553.5 and multiply by 10?

Because they try to save the depth array like this:

def save_depth_as_png(depth, filename, png_depth_scale):
    depth = depth * png_depth_scale
    depth = depth.astype(np.uint16)
    depth = Image.fromarray(depth)
    depth.save(filename)

when doing it like this, you need to consider that the range of uint16 is from 0 to 65535(2 ^16 -1). So, I guess what they did must be first clamp actual depth value to [0, 10.0], than multiply it by 6553.5, then convert it to uint16 without the risk of overflowing (but lose some accuracy from float to int). So, after loading the image, just divide it by 6553.5, and you get depth value after clamp. like this part of code at SplaTAM/datasets/gradslam_datasets/basedataset.py:

    def _preprocess_depth(self, depth: np.ndarray):
        r"""Preprocesses the depth image by resizing, adding channel dimension, and scaling values to meters. Optionally
        converts depth from channels last :math:`(H, W, 1)` to channels first :math:`(1, H, W)` representation.

        Args:
            depth (np.ndarray): Raw depth image

        Returns:
            np.ndarray: Preprocessed depth

        Shape:
            - depth: :math:`(H_\text{old}, W_\text{old})`
            - Output: :math:`(H, W, 1)` if `self.channels_first == False`, else :math:`(1, H, W)`.
        """
        depth = cv2.resize(
            depth.astype(float),
            (self.desired_width, self.desired_height),
            interpolation=cv2.INTER_NEAREST,
        )
        if len(depth.shape) == 2:
            depth = np.expand_dims(depth, -1)
        if self.channels_first:
            depth = datautils.channels_first(depth)
        return depth / self.png_depth_scale

jeezrick avatar Feb 03 '24 04:02 jeezrick