3d-photo-inpainting icon indicating copy to clipboard operation
3d-photo-inpainting copied to clipboard

Training

Open Armstrong-lsw opened this issue 4 years ago • 5 comments

I'm trying to train this network, while it seems not easy to converge, the result is BAD. I used the edge-connect code, with your Inpaint_Color_Net(partial conv), the input is (masked_rgb,masked_edge,context,mask), the label is rgb(mask+context), which is resized to 256x256 .I used the mask_context pool (according to depth gap in your code) in your paper to random crop pictures as train data. See the result. image from left to right: label_rgb; masked_rgb; masked_edge; label_edge(no use); generated_rgb; synthetized_rgb(mask x generated_rgb+context x label_rgb)

Is my train right?

Armstrong-lsw avatar Jun 05 '20 03:06 Armstrong-lsw

Hi, @Armstrong-lsw. Here are some suggestions for training the rgb inpainting network.

  1. We use depth edge as guidance while you use color edge as guidance.
  2. There is a weird ring surround the mask region.
  3. We follow this paper to design the loss function of rgb inpainting. (not Edge-Connect)

ShihMengLi avatar Jun 08 '20 09:06 ShihMengLi

Hi, @Armstrong-lsw. Here are some suggestions for training the rgb inpainting network.

  1. We use depth edge as guidance while you use color edge as guidance.
  2. There is a weird ring surround the mask region.
  3. We follow this paper to design the loss function of rgb inpainting. (not Edge-Connect)

Hi, @ShihMengLi
Reply:

  1. Weird ring? The ring surround synthesis region is for not converge, the ring surround region(synthesis+context) is because of the masked input(synthesis+context)
  2. I use the partial conv-net, not edge-connect.

New qustion:

I use your pretrain model to finetune rgb-inpainting and depth-edge depth-inpainting, while the result is BAD, the PSNR for depth is negative, because of the line depth = 1. / np.maximum(disp, 0.05) , then (0<depth<20)? The most important reason,maybe that I use my own RGBD data not COCO with MiDas predict depth,which has only 3000 pictures.

Armstrong-lsw avatar Jun 08 '20 11:06 Armstrong-lsw

Hi @Armstrong-lsw,

  • 2. Weird ring? The ring surround synthesis region is for not converge, the ring surround region(synthesis+context) is because of the masked input(synthesis+context) Okay, I assume you treat that ring as the context region. Then maybe you could thicken that ring (e.g. dilate it 30 times) without overwriting the synthesis region.
  • Our depth inpainting model synthesis the depth value in log scale. You need to do the following pre-process:
  1. Convert the depth map into log scale.
  2. Calculate the mean depth value within context region and subtract it from the depth map.

ShihMengLi avatar Jun 08 '20 19:06 ShihMengLi

Hi, @Armstrong-lsw. Here are some suggestions for training the rgb inpainting network.

  1. We use depth edge as guidance while you use color edge as guidance.
  2. There is a weird ring surround the mask region.
  3. We follow this paper to design the loss function of rgb inpainting. (not Edge-Connect)

Hi, @ShihMengLi Reply:

  1. Weird ring? The ring surround synthesis region is for not converge, the ring surround region(synthesis+context) is because of the masked input(synthesis+context)
  2. I use the partial conv-net, not edge-connect.

New qustion:

I use your pretrain model to finetune rgb-inpainting and depth-edge depth-inpainting, while the result is BAD, the PSNR for depth is negative, because of the line depth = 1. / np.maximum(disp, 0.05) , then (0<depth<20)? The most important reason,maybe that I use my own RGBD data not COCO with MiDas predict depth,which has only 3000 pictures.

Thanks for your reply! I used the log-mean process in my code. Maybe it just because my poor amount of data. It's hard to train this network with masked region of interest(synthesis+context) in a small quantity of pictures.Here is one of my mask sets synthesis(red) & context(blue). I crop images only in this pair of regions. image

Armstrong-lsw avatar Jun 09 '20 02:06 Armstrong-lsw

Hi, @ShihMengLi Following your paper, I'm making the mask dataset on the MSCOCO which has near 120 thousand pictures. So it will bring about 360 thousand masks for random 3 masks each picture. I used the mask from "mesh.py ->context_and_holes->depth_inpainting.depth_feat_model.forward_3P(resize_mask,...) " to generate mask lib, about 65 seconds/picture in my server, therefore it will takes approximately 2000 hours. Is there anything wrong?

Armstrong-lsw avatar Jun 11 '20 10:06 Armstrong-lsw