Real3DPortrait icon indicating copy to clipboard operation
Real3DPortrait copied to clipboard

How to fine-tune the pre-trained model on my dataset?

Open RayShing opened this issue 1 year ago • 9 comments

Hello, I would like to ask how can I load the pre-trained model and fine-tune it on my self-collected dataset?

RayShing avatar Jul 09 '24 02:07 RayShing

Hi, you can use the init_from_ckpt option.

yerfor avatar Jul 09 '24 05:07 yerfor

Hi, you can use the init_from_ckpt option.

Thanks!

I have another question regarding the pre-trained models you provided. Specifically, you included "audio2secc_vae" and "secc2plane_torso_orig". However, in your training guidelines for audio, it is recommended to first train "audio_lm3d_syncnet" and then "audio2motion". Similarly, for motion, the guideline suggests first training "Img-to-Plane" followed by "Motion-to-Video", which includes "secc2plane_head" and "secc2plane_torso".

I am a bit confused about their relationships. Are "audio2secc_vae" equivalent to "audio2motion" and "secc2plane_torso_orig" equivalent to "secc2plane_torso"?

For audio training, should I:

  1. Train "audio_lm3d_syncnet" myself, and then
  2. When training "audio2motion", provide the checkpoints from both my trained "audio_lm3d_syncnet" and the provided "audio2secc_vae"?

Or, do I not have to train "audio_lm3d_syncnet" at all and just provide "audio2secc_vae" for fine-tuning?

Similarly, for Motion-to-Video training, should I:

  1. Train "Img-to-Plane" myself
  2. Train "secc2plane_head" myself, based on trained "Img-to-Plane"
  3. When training "secc2plane_torso", provide the checkpoints from both my trained "secc2plane_head" and the provided "secc2plane_torso_orig"?

But seems we can only set one checkpoint for "init_from_ckp"?

Additionally, does "secc2plane_head" imply inferring only the head area without the torso?

Thank you so much for your help!

RayShing avatar Jul 09 '24 06:07 RayShing

  1. Yes, "audio2secc_vae" equivalent to "audio2motion" and "secc2plane_torso_orig" equivalent to "secc2plane_torso"
  2. For audio training, should I ==> Yes, you need to train a syncnet.
  3. You can skip the image-to-plane pre-training, and go through the init_from_ckpt => secc2plane_head => secc2plane_torso.
  4. does "secc2plane_head" imply inferring only the head area without the torso? ==> Yes

yerfor avatar Jul 09 '24 07:07 yerfor

Thank you so much for your response! I am still a bit confused about this step:

  1. You can skip the image-to-plane pre-training, and go through the init_from_ckpt => secc2plane_head => secc2plane_torso.

Where can we get the pre-trained model for image-to-plane? It appears that currently, we only have the pre-trained models for "audio2motion" and "secc2plane_torso".

Additionally, I noticed that during evaluation, the human figure changes each time instead of using the one I provided. Where is this part of the setup, and how can we modify it to use my provided human figure?

image

Thank you for your time!

RayShing avatar Jul 09 '24 14:07 RayShing

you can use the provided pre-trained secc2plane_torso to initialize you own secc2plane_head model, just set strict=False.

For using your provided human figure, please modify the code in validation_steps

yerfor avatar Jul 09 '24 14:07 yerfor

you can use the provided pre-trained secc2plane_torso to initialize you own secc2plane_head model, just set strict=False.

For using your provided human figure, please modify the code in validation_steps

Thank you for your reply!

I have modified the training logic. However, when I tried to train the secc2plane_head model on my 4090 GPU, I encountered the OOM issue. Is there any way to reduce the GPU memory requirement during training? I tried to reduce "num_workers" but it did not work

RayShing avatar Jul 09 '24 16:07 RayShing

You can reduce the batch_size, or you can try amp=True

yerfor avatar Jul 09 '24 18:07 yerfor

@yerfor Hi, Thank you so much for your wonderful work. I was wondering if you could also release a public avaliable model of the syncnet, so we can finetune on our dataset much easier?

moliq1 avatar Aug 19 '24 08:08 moliq1

@felixshing Hello, I would like to inquire about your experience. Did you achieve the desired results? I have about 10 minutes of training data for each character on my end. Is that enough?

jupinter avatar Feb 06 '25 02:02 jupinter