elenakovacic
elenakovacic
> I also tried to write the training code, but when I run the training code with A100 40G, I get CUDA oom. What is the GPU memory you have...
Nice results @neuralchen, did you continue training after 14k steps as well? Are you also training it in fp16 only?
Oh, I see, I've never used deepspeed before, let me check. Can we resume training in fp16 using deepspeed? I mean if we load the model into fp16, accelerate will...
> > Models are trained in fp16 > > Hi, when training in fp16, have you encounterd the issue "Attempting to unscale FP16 gradients."? Facing same issue
This is not solution, this will train it in float32 then instead of float16
How much epochs did you train it for @Aaron2117 ? Author's results are also bad on some images somethimes, but if yours are bad on almost all test images, then...
> @neuralchen Could you tell us the reason? The training hyperparameters are completely same as those in our paper. Our model was trained for 36000 steps with the batch-size of...