Hello, do you know if I train another dataset (HR size = 480*480),which is different from your dataset, should I change the "HR_size":128 in tain_spsr.json?
If so, I'd change the 128 to 480, then I can't run the train code. It ends up with something like
23-06-17 14:54:34.882 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
File "train.py", line 182, in
main()
File "train.py", line 105, in main
model.optimize_parameters(current_step)
File "/home//SPSR-master/code/models/SPSR_model.py", line 282, in optimize_parameters
pred_g_fake = self.netD(self.fake_H)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//SPSR-master/code/models/modules/architecture.py", line 247, in forward
x = self.classifier(x)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(input, **kwargs)
File "/home//anaconda3/envs/RRSGAN37/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x115200 and 8192x100)
To be noted that, within this error, '1*115200' is relevant to my batch size. When I use default batch size, it would turn out 'Out of CUDA Memory', so I had to set batch_size to 1.
If I don't change the HR_size (default = 128), then I can run the train code. But I don't know such training on my dataset (HR size = 480*480) would be appropriate or not?
I'm looking forward to your reply!
THANKS !