Questions about CorrDiff example
my machine got the error about " out of memory"
i have change the "total_batch_size" to 2, ues "amp-fp16" , and use dataset "'2018-01-01' to '2018-01-10'" ten days data
Are there any other places where I can modify to change the model's memory usage? What is the meaning of the "training_duration: 200000000"?
my machine got the error about " out of memory"
i have change the "total_batch_size" to 2, ues "amp-fp16" , and use dataset "'2018-01-01' to '2018-01-10'" ten days data
Are there any other places where I can modify to change the model's memory usage? What is the meaning of the "training_duration: 200000000"?
I have solved the problem. My machine only support the "batch_size_per_gpu" as 1 . However, what is the meaning of the "training_duration: 200000000"? What does it primarily affect?
@MyGitHub-G training_duration is the number of (repeated) samples/images the model sees during the training. If you divide it by the number of unique samples in the dataset, it gives you the number of epochs.