pytorch-vdsr icon indicating copy to clipboard operation
pytorch-vdsr copied to clipboard

Training Error

Open DanChen001 opened this issue 6 years ago • 4 comments

Hi,

Thank you for sharing the code.

I meet the following error whlie training the network. Do you know the reason? Thanks.

#####################

Namespace(batchSize=128, clip=0.4, cuda=True, gpus='0', lr=0.1, momentum=0.9, nEpochs=50, pretrained='', resume='', start_epoch=1, step=10, threads=1, weight_decay=0.0001) => use gpu id: '0' Random Seed: 3131 ===> Loading datasets ===> Building model C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) ===> Setting GPU ===> Setting Optimizer ===> Training Epoch = 1, lr = 0.1 Traceback (most recent call last): File "main_vdsr.py", line 130, in main() File "main_vdsr.py", line 85, in main train(training_data_loader, optimizer, model, criterion, epoch) File "main_vdsr.py", line 103, in train for iteration, batch in enumerate(training_data_loader, 1): File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in iter return _DataLoaderIter(self) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in init w.start() File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle _thread._local objects Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

DanChen001 avatar Nov 05 '18 23:11 DanChen001

Hi, @twtygqyy I solved this problem as suggested https://github.com/twtygqyy/pytorch-vdsr/issues/11 @ZhaoJinHA Thanks.

However, I do not know the reason, do you know why? thanks

DanChen001 avatar Nov 06 '18 00:11 DanChen001

for testing, I have the following problem, do you know why? Thanks. @twtygqyy ##################### => use gpu id: '0' C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py:425: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) Traceback (most recent call last): File "eval.py", line 33, in model = torch.load(opt.model, map_location=lambda storage, loc: storage)["model"] File "C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py", line 358, in load return _load(f, map_location, pickle_module) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\serialization.py", line 542, in _load result = unpickler.load() UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 918: ordinal not in range(128)

DanChen001 avatar Nov 06 '18 01:11 DanChen001

Hi @DanChen001 , the problem is due to the python version. Please refer https://github.com/twtygqyy/pytorch-vdsr/issues/21#issuecomment-372150250 for the solution. And I think the first issue you mentioned is due to training with multiple gpu and testing with single gpu.

twtygqyy avatar Nov 06 '18 01:11 twtygqyy

Hi @DanChen001 , the problem is due to the python version. Please refer #21 (comment) for the solution. And I think the first issue you mentioned is due to training with multiple gpu and testing with single gpu.

@twtygqyy Thanks. I will try.

DanChen001 avatar Nov 06 '18 02:11 DanChen001