FLAVR icon indicating copy to clipboard operation
FLAVR copied to clipboard

Training issue

Open issakh opened this issue 2 years ago • 4 comments

Hi, I've been trying to train this network on an A100 GPU. However, as torch 1.5.0 doesn't support this GPU I am forced to use torch 1.9.0. The training is broken for torch versions>1.5.0 but cannot find the reason why. I have looked at the differences between the torch versions, however, nothing is clear as to why this happens. Basically, the model stays stuck at around 20dB for the duration of training. I previously tested this code on a 1080Ti with torch 1.5.0 and that worked fine. But due to memory constraints and training time, the A100 would be the better option. Do you have any idea why this occurs and any possible solutions?

Thanks

issakh avatar Mar 22 '22 14:03 issakh

Hi, I've been trying to train this network on an A100 GPU. However, as torch 1.5.0 doesn't support this GPU I am forced to use torch 1.9.0. The training is broken for torch versions>1.5.0 but cannot find the reason why. I have looked at the differences between the torch versions, however, nothing is clear as to why this happens. Basically, the model stays stuck at around 20dB for the duration of training. I previously tested this code on a 1080Ti with torch 1.5.0 and that worked fine. But due to memory constraints and training time, the A100 would be the better option. Do you have any idea why this occurs and any possible solutions?

Hi, I have the same problem as you. The test results are very good, but the PSNR is kept at about 17 during training. Has your problem been solved

weiMytian avatar Apr 23 '22 00:04 weiMytian

Hi, I have not been able to find a solution to this. Tried writing the training code for this, but the same issues arose when adding the section for validation. Looked at the release logs to see the difference between torch 1.5.1 (the maximum version the flavr code works on) and 1.6.0 and none of the new additions or depreciations explain why the training no longer works on newer torch versions

issakh avatar Apr 27 '22 04:04 issakh

@issakh Can you make sure that the versions of the PyTorch are the same as recommended? I did not face any such issues at my end.

tarun005 avatar May 12 '22 00:05 tarun005

Hi, so the problem is I can't have the recommended version because my GPU doesn't support the cuda version torch 1.5.0 requires That is the current problem I'm facing and the weird thing is I'm not sure what causes the code to break!

issakh avatar May 22 '22 15:05 issakh