RePaint How to train or finetune on custom dataset？

trafficstars

How to train or finetune on custom dataset？

Oct 07 '22 15:10 universewill

have you get the train code?

Oct 09 '22 08:10 kafei123456

You need to train using the original guided-diffusion code base.

To get the same commit as we used do the following:

git clone https://github.com/openai/guided-diffusion.git
cd guided-diffusion
git checkout 912d577

For training we use 256 × 256 crops in three batches on 4×V100 GPUs each.

Does that work for you?

Nov 05 '22 03:11 andreas128

@universewill Can you use your own dataset to train the model?

Mar 30 '23 04:03 xyz-xdx

Could you provide the parameter about training? For example,imagenet256's training parameter?

May 04 '23 06:05 jbnjvc10000

I did try to reproduce the training details on CelebaHQ without success. I updated the guided-diffusion params using the face_example.yml file. Are those the params used to train? The loss stay at NaN for multiple steps, like a lot. But if I put the default conf, only for some step and it get better. What kind of behavior is expected?

Dec 08 '23 02:12 MaugrimEP

我确实尝试在 CelebaHQ 上重现培训细节，但没有成功。我使用 face_example.yml 文件更新了 guided-diffusion 参数。这些是用于训练的参数吗？损失在 NaN 停留了多个步骤，就像很多一样。但是，如果我放置默认的conf，只需执行一些步骤，它就会变得更好。预期会出现什么样的行为？

The parameter in face_example.yml is the training parameter.
Modify some parameters will modify the network(like u-net structure)

Dec 09 '23 08:12 jbnjvc10000

@jbnjvc10000 did you tried? when I set num_channels to 256, i often have NaN in the loss which skip the training step. And it does not go away

Dec 10 '23 05:12 MaugrimEP

@jbnjvc10000 did you tried? when I set num_channels to 256, i often have NaN in the loss which skip the training step. And it does not go away

@MaugrimEP Sorry I didn't meet your problem. Maybe you should make sure your dataset and parameters are the same as the paper provides?

Dec 11 '23 02:12 jbnjvc10000

@jbnjvc10000 I solved my issue. It was system-related. I tried the same code under Linux and Windows. Windows produces NaN and Linux was perfect. :> classic

Dec 13 '23 16:12 MaugrimEP

@MaugrimEP

By the way, it may be a file could not be find because windows had a different path

Dec 16 '23 01:12 jbnjvc10000

@jbnjvc10000 did you tried? when I set num_channels to 256, i often have NaN in the loss which skip the training step. And it does not go away

@MaugrimEP Hello, what was the type of GPU you used to train?

I used a single 12GB GeForce RTX4070 GPU to train model using guided-diffusion, and I trained my model based on my own dataset consisting of 120000 high quality human facial images (each image is 256*256 pixels and RGB), however, if I set num_channels to 256, whatever batch_size I set, when I run the training command, it gave me error message: RuntimeError: CUDA out of memory. I think it meant my GPU memory capacity isn't enough, so I am curious about what the type of GPU you used to train was. I will be grateful if you reply to me !

Dec 19 '23 08:12 LinWeiJeff

RePaint RePaint copied to clipboard

How to train or finetune on custom dataset？

RePaint
RePaint copied to clipboard