articulated-object-nerf icon indicating copy to clipboard operation
articulated-object-nerf copied to clipboard

Questions regarding the AE_ART (CLA-NeRF) training

Open DJNing opened this issue 1 year ago • 5 comments

Hi, I try to train the CLA-NeRF with the following configuration using the data you published:

{
    "dataset_name": "sapien_multi",
    "root_dir": "data/sapien_single_scene_art", 
    "exp_name": "sapien_single_scene_articulated",
    "exp_type": "vanilla_ae_art",
    "img_wh": [320, 240],
    "white_back": true,
    "batch_size": 1,
    "num_gpus": 4
}
  1. Based on my understanding, when we are training the model, it should go to training_step() of the class LitNeRF_AE_ART(LitModel). However, the program goes directly to the validation_step(). 截屏2023-08-31 11 24 23

  2. In the method render_rays of class LitNeRF_AE_ART(LitModel), it feels like you are collecting some rays here

def render_rays(self, batch, latents):
      B = batch["rays_o"].shape[0]
      ret = defaultdict(list)
      for i in range(0, B, self.hparams.chunk):
          batch_chunk = dict()
          for k, v in batch.items():
              if k=='img_wh' or k =='src_imgs':
                  continue
              if k =='radii':
                  batch_chunk[k] = v[:, i : i + self.hparams.chunk]
              else:
                  batch_chunk[k] = v[i : i + self.hparams.chunk]   

But the key 'radii' doesn't exist in the batch, so the batch_chunk always goes to the else. Here's the list of keys in the batch

截屏2023-08-31 11 29 17

All of those keys not mentioned in the if statement will go to this line.

Could you fix the code here?

DJNing avatar Aug 31 '23 10:08 DJNing

  1. This is how pytorch lightning goes through its training process, it first goes into validation to do sanity checking for a few steps before going to train. See the docs here
  2. This choice was made since nerf-factory's original implementation of mipnerf360 and others use radii from the dataloader. The current training/datalaoder doesn't step mipnerf360 or others but we can keep it as is if we want to bring it later on. That would entail adding just a datalaoder etc. What do you think?

zubair-irshad avatar Aug 31 '23 15:08 zubair-irshad

honestly, since I haven't checked the implementations for art_autodecoder, I am not sure about the compatibility of the existing code. If you use radii everywhere, I think we could change the dataloader to adapt it. Otherwise, change the model training logic. To summarize, I prefer to make minimal changes to make sure the code works with the existing data.

DJNing avatar Aug 31 '23 15:08 DJNing

Does the code not work with the existing data and do you see any errors?

zubair-irshad avatar Aug 31 '23 16:08 zubair-irshad

Yes, it reports an error as there's no radii key in the batch. And this line execute all values with keys not mentioned in the if statement.

Could you try fix it with the correct logic?

DJNing avatar Sep 01 '23 09:09 DJNing

Were you able to look into this and resolve it? Apologies I have not been able to get much time lately due to the ICRA conference push. Please feel free to create a PR if you were able to resolve it locally on your end, thanks!

zubair-irshad avatar Sep 19 '23 15:09 zubair-irshad