RePaint
RePaint copied to clipboard
How to train or finetune on custom dataset?
How to train or finetune on custom dataset?
have you get the train code?
You need to train using the original guided-diffusion code base.
To get the same commit as we used do the following:
git clone https://github.com/openai/guided-diffusion.git
cd guided-diffusion
git checkout 912d577
For training we use 256 × 256 crops in three batches on 4×V100 GPUs each.
Does that work for you?
@universewill Can you use your own dataset to train the model?
Could you provide the parameter about training? For example,imagenet256's training parameter?
I did try to reproduce the training details on CelebaHQ without success. I updated the guided-diffusion params using the face_example.yml file. Are those the params used to train? The loss stay at NaN for multiple steps, like a lot. But if I put the default conf, only for some step and it get better. What kind of behavior is expected?
我确实尝试在 CelebaHQ 上重现培训细节,但没有成功。我使用 face_example.yml 文件更新了 guided-diffusion 参数。这些是用于训练的参数吗?损失在 NaN 停留了多个步骤,就像很多一样。但是,如果我放置默认的conf,只需执行一些步骤,它就会变得更好。预期会出现什么样的行为?
- The parameter in face_example.yml is the training parameter.
- Modify some parameters will modify the network(like u-net structure)
@jbnjvc10000 did you tried? when I set num_channels to 256, i often have NaN in the loss which skip the training step. And it does not go away
@jbnjvc10000 did you tried? when I set
num_channelsto 256, i often have NaN in the loss which skip the training step. And it does not go away
@MaugrimEP Sorry I didn't meet your problem. Maybe you should make sure your dataset and parameters are the same as the paper provides?
@jbnjvc10000 I solved my issue. It was system-related. I tried the same code under Linux and Windows. Windows produces NaN and Linux was perfect. :> classic
@MaugrimEP
By the way, it may be a file could not be find because windows had a different path
@jbnjvc10000 did you tried? when I set
num_channelsto 256, i often have NaN in the loss which skip the training step. And it does not go away
@MaugrimEP Hello, what was the type of GPU you used to train?
I used a single 12GB GeForce RTX4070 GPU to train model using guided-diffusion, and I trained my model based on my own dataset consisting of 120000 high quality human facial images (each image is 256*256 pixels and RGB), however, if I set num_channels to 256, whatever batch_size I set, when I run the training command, it gave me error message: RuntimeError: CUDA out of memory. I think it meant my GPU memory capacity isn't enough, so I am curious about what the type of GPU you used to train was.
I will be grateful if you reply to me !