svox2 icon indicating copy to clipboard operation
svox2 copied to clipboard

cam_scale_factor parameter

Open povolann opened this issue 2 years ago • 26 comments

Hello, I tried to search and looked in the code, but I am kind of lost. What exactly is the cam_scale_factor? Thank you for your answer!

povolann avatar Sep 06 '22 06:09 povolann

Hi, did you figure out what this parameter is?

Learningm avatar Nov 02 '22 02:11 Learningm

Hi, not really, but I didn't play around with it.

povolann avatar Nov 02 '22 07:11 povolann

Got it. BTW, have you met the "floaters" problem when trying your custom_data ?

Learningm avatar Nov 02 '22 07:11 Learningm

Yes, I still have floaters, but the results are pretty good for my purpose. Especially when I played around with the parameters: lambda_tv, lambda_tv_sh, and background_brightness.

povolann avatar Nov 02 '22 08:11 povolann

Thanks! I will try your suggestions.

Learningm avatar Nov 02 '22 08:11 Learningm

In most NeRFs, you need to make sure your region of interest is in a "controlled region" of space. You cannot really predict where colmap will put your poses. Therefore, most algorithms (1) recenter the poses (e.g. they compute the average of all poses and assume this the center, or, for non-centered captures they compute some form of "look-at" point) and (2) rescale all the poses to a certain radius. The cam_rescale factor refers to that. It's mentioned in section 3.6 of the paper: "we pre-scale the inner scene to be approximately contained in the unit sphere".

reconlabs-sergio avatar Nov 02 '22 08:11 reconlabs-sergio

I got another blurry problem, the surface on my custom object is blur. Have you met similar issues?

The 'floaters' seems to be better when I tune the parameters mentioned above.

Learningm avatar Nov 04 '22 03:11 Learningm

Hm, depends what kind of blurry problem. Can you share an example?

povolann avatar Nov 04 '22 05:11 povolann

Hi, as following, the object surface are so blurry.

https://user-images.githubusercontent.com/13192241/199918064-e2490803-6f67-4526-9df4-51eb3c516c04.mp4

Learningm avatar Nov 04 '22 07:11 Learningm

In most NeRFs, you need to make sure your region of interest is in a "controlled region" of space. You cannot really predict where colmap will put your poses. Therefore, most algorithms (1) recenter the poses (e.g. they compute the average of all poses and assume this the center, or, for non-centered captures they compute some form of "look-at" point) and (2) rescale all the poses to a certain radius. The cam_rescale factor refers to that. It's mentioned in section 3.6 of the paper: "we pre-scale the inner scene to be approximately contained in the unit sphere".

Cool! I got another followup question, why should we set the "cam_scale_factor" ? In this implementation, it's set to be 0.9 or 0.95, after computing the average of all poses, there is already a scale, using the default nvsf dataset format. If we set this "cam_scale_factor" to 1.0, it seems nothing special, is that right ?

Learningm avatar Nov 08 '22 03:11 Learningm

I haven't checked the code in detail, but I assume they just make the poses "a tiny bit" smaller than the unit sphere, to be safe. I don't think it's a big deal. What might be important is to take this number into account if you have any "scale-dependent" components in your pipeline. Like, if you are wanting to make a physically accurate object, or something like that.

reconlabs-sergio avatar Nov 08 '22 03:11 reconlabs-sergio

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

reconlabs-sergio avatar Nov 08 '22 03:11 reconlabs-sergio

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

I also think this might be because of the inaccurate poses. I tried to switch poses for 2 similar datasets and got similar rendering results.

But in this regard, is there any advice on how to change the Colmap parameters in case this happens, but I can't change/retake my photos? Or how to change the parameters for Colmap in case it detects fewer photos than I have in the dataset?

povolann avatar Nov 08 '22 05:11 povolann

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

No, I haven't tried the manual feature patterns during taking photos.
I double checked the features extracted by colmap and visualize, the object in the video has few feature points on its smooth and white surface. Inaccurate poses seems to be the reason. As @povolann mentioned, how can I get a more accurate camera pose using Colmap without retaking my photos?

Learningm avatar Nov 08 '22 07:11 Learningm

I haven't checked the code in detail, but I assume they just make the poses "a tiny bit" smaller than the unit sphere, to be safe. I don't think it's a big deal. What might be important is to take this number into account if you have any "scale-dependent" components in your pipeline. Like, if you are wanting to make a physically accurate object, or something like that.

I see. I got another question, can we adjust the "control region" in this repo like others with "aabb scale"? I am checking the code in detail and want to adjust the "control region"(SparseGrid in this repo) to a smaller region so that it may focus more on the foreground. But the SparseGrid seems to be normalized already(center[0,0,0], radius[1,1,1]), that's a [-1, 1] bounding box, covering the whole scene.

Learningm avatar Nov 08 '22 07:11 Learningm

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

No, I haven't tried the manual feature patterns during taking photos. I double checked the features extracted by colmap and visualize, the object in the video has few feature points on its smooth and white surface. Inaccurate poses seems to be the reason. As @povolann mentioned, how can I get a more accurate camera pose using Colmap without retaking my photos?

Only thing I can think of is:

  • Trying to evaluate why your poses are bad: are your pictures blurry? Are they downsampled? If so, maybe try some AI-based upsampling/deblurring algorithm, but it's kind of a long shot.
  • Using commercial software to compute the poses (e.g. RealityCapture, etc...)

But anyway, your results will probably be bound to the garbage-in -> garbage-out motto if your original images don't have enough quality. Otherwise, depending on what's your purpose, you may want to try another algorithm (NerFactor, INGP, NerfStudio)

I don't have enough experience with this code to answer your AABB question. What you say sounds reasonable, have you also tried increasing the grid resolution?

reconlabs-sergio avatar Nov 08 '22 08:11 reconlabs-sergio

From your video, it would seem your problem could be related to inaccurate poses. Have you tried to put some nice features on the table (stickers, QRcodes, newspapers?)

No, I haven't tried the manual feature patterns during taking photos. I double checked the features extracted by colmap and visualize, the object in the video has few feature points on its smooth and white surface. Inaccurate poses seems to be the reason. As @povolann mentioned, how can I get a more accurate camera pose using Colmap without retaking my photos?

Only thing I can think of is:

  • Trying to evaluate why your poses are bad: are your pictures blurry? Are they downsampled? If so, maybe try some AI-based upsampling/deblurring algorithm, but it's kind of a long shot.
  • Using commercial software to compute the poses (e.g. RealityCapture, etc...)

But anyway, your results will probably be bound to the garbage-in -> garbage-out motto if your original images don't have enough quality. Otherwise, depending on what's your purpose, you may want to try another algorithm (NerFactor, INGP, NerfStudio)

I don't have enough experience with this code to answer your AABB question. What you say sounds reasonable, have you also tried increasing the grid resolution?

Thanks for your quick reply. Pose should be evaluated and checked if accurate enough using some other methods. Let me think it later.

I tried increasing the grid resolution, but in my opinion, the grid resolution means the bounding box covering the whole scene, [128, 128, 128] or [256, 256, 256] just means to divide the whole scene into 128 cubic or 256 cubic small grid. It doesn't have relationship between the "control region" (the center object).

As say, the center object may occupy just small proportion grids, when the whole scene consists 128 cubic(more than 1 million) grids, the object may occupy 10k grids or so. It seems there doesn't exist a way to visualize the voxel occupied situation like octree. I wonder whether I get the correct understanding or not.

Learningm avatar Nov 08 '22 09:11 Learningm

@Learningm @povolann sorry to bother you. I also encountered lots of floaters when training on my custom data. I tried the parameters you mentioned above, they help, but still not very good. So I wonder besides the lambda_tv parameters, have you tried lambda_beta and lambda_sparsity? Cause they seem more related to floaters, as they are directly pushing sigma to either 0 or 1.

Wuziyi616 avatar Nov 15 '22 02:11 Wuziyi616

@Learningm @povolann sorry to bother you. I also encountered lots of floaters when training on my custom data. I tried the parameters you mentioned above, they help, but still not very good. So I wonder besides the lambda_tv parameters, have you tried lambda_beta and lambda_sparsity? Cause they seem more related to floaters, as they are directly pushing sigma to either 0 or 1.

I have tried lambda_beta and lambda_sparsity, since the default parameters are already small, I guess they should be bigger and make effect on the foreground & background part, however, 10x or 100x the default parameter always get worse results.

Learningm avatar Nov 15 '22 03:11 Learningm

@Learningm Thanks for your reply! Interesting, I also feel that 1e-5, 1e-11 are too small, so maybe I should increase them. But it seems that in your experiments increasing them didn't lead to better results? That's confusing hummm

Wuziyi616 avatar Nov 15 '22 03:11 Wuziyi616

@Wuziyi616 I got blurry edge problem as the video posted above and it seems hard to solve through tuning parameters. Did you get some good results without the blurry problem on your data?

Learningm avatar Nov 15 '22 06:11 Learningm

I'm training on CO3D which has good camera poses. So my results of object surface is all good, just some floaters

https://user-images.githubusercontent.com/37072215/201841878-1aac4ead-527c-4cbe-9663-0764b3782faa.mp4

Wuziyi616 avatar Nov 15 '22 06:11 Wuziyi616

@Wuziyi616 Your result looks good. I got similar result after tuning the parameters mentioned above.

Learningm avatar Nov 16 '22 07:11 Learningm

Hi @Learningm , I use this codebase. They've tuned the parameters very well. I think compared to the co3d setting here, they increase the sparsity_loss weight by 10x.

Wuziyi616 avatar Nov 16 '22 17:11 Wuziyi616

@Wuziyi616 Cool! Thanks for sharing.

Learningm avatar Nov 18 '22 08:11 Learningm

Yeah sorry I have not maintained this codebase very much at all and will try to do some things when I have time. This parameter simply directly scales the overall scene, on top of the normalization method. While our background model allows for modelling unbounded scene, it still matters which portion of the scene is in the scene

  • A better camera normalization scheme can help, like in instant-ngp. This is implemented in util/util.py and sometimes does not work well in the current code.
  • Our COLMAP should be ran with --noradial
  • Also note custom.json can have too big near clip and custom_alt.json can do better, but adjusting this near clip to something in the middle may be ideal
  • Adding distortion loss from mip-nerf 360 and switching to exp activations are other improvements possible

sxyu avatar Jan 15 '23 06:01 sxyu