convmesh icon indicating copy to clipboard operation
convmesh copied to clipboard

Custom images

Open whatdhack opened this issue 4 years ago • 5 comments

@dariopavllo , congratulations on your presentation at NIPS 2020. Interesting work. Have a few quick questions.

  1. What exactly would be involved in using custom images to generate 3D meshes and textures ? Would fine-tuning work ?
  2. Looking to generate 3D meshes and then render views from different directions of custom images (i.e keeping geometry and texture constant) - would that be doable ? If so what would that pipeline look like for training and inference ?

whatdhack avatar Jan 01 '21 23:01 whatdhack

Hi,

  1. You just need to retrain the model on your dataset. While fine-tuning is in principle possible with GANs, it's better to retrain from scratch for best results. The dataset must contain segmentation masks (you can use Mask R-CNN to infer them) as well as 3D poses (or alternatively, keypoints or something else from which you can estimate poses).
  2. Yes, that's actually guaranteed by the model. The generator produces a full 3D mesh and a full texture, so you're free to render it from any view.

dariopavllo avatar Jan 04 '21 15:01 dariopavllo

@dariopavllo thanks. So as I understand there are 3 steps in training from scratch - convmesh , inverse rendering and GAN training in order. So for my custom image sets, all 3 steps have to be repeated, right ? As I can see, the keypoints are used only in the convmesh step, right ?

Expanding on my second question. Suppose I have an image, and I want to generate a different view of that image. The GAN output depends on the z input. So, how should I select z so that GAN generates mesh and texture that is exactly same ( or very similar ) to the image .

whatdhack avatar Jan 04 '21 17:01 whatdhack

Yes, you would need to repeat the 3 steps. Poses/keypoints are only used for the first step.

Regarding your second question, I think what you want to do is more related to a reconstruction approach (as opposed to generation). For instance, you could use CMR to obtain the 3D mesh from an image, and then render it from a different view. If you still want to do it using a GAN, one idea is to pass both the rendered image and the "target" image through a VGG network, and minimize their difference in feature space (w.r.t. the latent code z). This is what people usually do for models like StyleGAN.

dariopavllo avatar Jan 05 '21 14:01 dariopavllo

@dariopavllo , thanks. The first step, convmesh, is a stripped down version of cmr as you pointed out in the paper, hence that alone can theoretically generate the geometry and texture for a different pose. Would be great to read your thoughts on that.

whatdhack avatar Jan 06 '21 17:01 whatdhack

Yes, you are correct! The only thing is that we don't use any perceptual losses for that step (since we don't care about texture quality -- it is thrown away anyway). If you care about texture quality, you should use CMR (or equivalent) or improve texture supervision for the first step of convmesh.

dariopavllo avatar Jan 11 '21 17:01 dariopavllo