aitom icon indicating copy to clipboard operation
aitom copied to clipboard

Gum-Net Training not improving with demo data

Open kaysagit opened this issue 3 years ago • 4 comments

Hi,

I was testing the Gum-Net and for that used the provided demo data set. After around 30 epochs and around 15 hours of training I stopped it because there is no improvement in the loss function, please see below the logs of the training procedure.

Before finetuning: Rotation error: 1.7350925500744534 +/- 0.6650064011311111 Translation error: 8.442523177067761 +/- 3.44293514383784 ---------- Training Iteration 0 4/4 [==============================] - 1784s 404s/step - loss: 0.8216 Training Iteration 1 4/4 [==============================] - 1781s 405s/step - loss: 0.8218 Training Iteration 2 4/4 [==============================] - 1774s 404s/step - loss: 0.8251 Training Iteration 3 4/4 [==============================] - 1788s 406s/step - loss: 0.8274 Training Iteration 4 4/4 [==============================] - 1783s 405s/step - loss: 0.8334 Training Iteration 5 4/4 [==============================] - 1782s 405s/step - loss: 0.8201 Training Iteration 6 4/4 [==============================] - 1777s 405s/step - loss: 0.8250 Training Iteration 7 4/4 [==============================] - 1797s 407s/step - loss: 0.8310 Training Iteration 8 4/4 [==============================] - 1787s 407s/step - loss: 0.8336 Training Iteration 9 4/4 [==============================] - 1784s 406s/step - loss: 0.8207 Training Iteration 10 4/4 [==============================] - 1787s 406s/step - loss: 0.8258 Training Iteration 11 4/4 [==============================] - 1779s 405s/step - loss: 0.8235 Training Iteration 12 4/4 [==============================] - 1784s 406s/step - loss: 0.8296 Training Iteration 13 4/4 [==============================] - 1773s 402s/step - loss: 0.8271 Training Iteration 14 4/4 [==============================] - 1773s 403s/step - loss: 0.8199 Training Iteration 15 4/4 [==============================] - 1785s 406s/step - loss: 0.8315 Training Iteration 16 4/4 [==============================] - 1789s 407s/step - loss: 0.8264 Training Iteration 17 4/4 [==============================] - 1777s 405s/step - loss: 0.8336 Training Iteration 18 4/4 [==============================] - 1774s 403s/step - loss: 0.8299 Training Iteration 19 4/4 [==============================] - 1790s 407s/step - loss: 0.8303 Training Iteration 20 4/4 [==============================] - 1784s 406s/step - loss: 0.8244 Training Iteration 21 4/4 [==============================] - 1786s 407s/step - loss: 0.8242 Training Iteration 22 4/4 [==============================] - 1789s 406s/step - loss: 0.8245 Training Iteration 23 4/4 [==============================] - 1782s 406s/step - loss: 0.8253 Training Iteration 24 4/4 [==============================] - 1789s 405s/step - loss: 0.8258 Training Iteration 25 4/4 [==============================] - 1784s 406s/step - loss: 0.8238 Training Iteration 26 4/4 [==============================] - 1782s 405s/step - loss: 0.8200 Training Iteration 27 4/4 [==============================] - 1779s 405s/step - loss: 0.8282 Training Iteration 28 4/4 [==============================] - 1780s 405s/step - loss: 0.8251 Training Iteration 29 2/4 [==============>...............] - ETA: 19:00 - loss: 0.8142

Do you have any suggestions or explanation why the training with your demo dataset is not working? I did not change the source code.

Kind regards!

kaysagit avatar Feb 19 '22 17:02 kaysagit

Discussed

xiangruz avatar Feb 25 '22 22:02 xiangruz

I got a similar result. Could you tell me how to solve it? Thank you.

JachyLikeCoding avatar Apr 22 '22 00:04 JachyLikeCoding

This is my result:

Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8265 Training Iteration 4 Epoch 1/1 100/100 [==============================] - 1387s 14s/step - loss: 0.8275 Training Iteration 5 Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8268 Training Iteration 6 Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8315 Training Iteration 7 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8324 Training Iteration 8 Epoch 1/1 100/100 [==============================] - 1390s 14s/step - loss: 0.8297 Training Iteration 9 Epoch 1/1 100/100 [==============================] - 1390s 14s/step - loss: 0.8312 Training Iteration 10 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8302 Training Iteration 11 Epoch 1/1 100/100 [==============================] - 1387s 14s/step - loss: 0.8302 Training Iteration 12 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8308 Training Iteration 13 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8249 Training Iteration 14 Epoch 1/1 100/100 [==============================] - 1390s 14s/step - loss: 0.8299 Training Iteration 15 Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8318 Training Iteration 16 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8263 Training Iteration 17 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8288 Training Iteration 18 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8289 Training Iteration 19 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8290

JachyLikeCoding avatar Apr 22 '22 00:04 JachyLikeCoding

I got a similar result. Could you tell me how to solve it? Thank you.

What we observed is that sometimes on the low SNR dataset (using the pre-trained model), the loss may not be decreasing but if you output the transformation error before and after finetuning, it is improving. Hopefully it helps!

xiangruz avatar Apr 27 '22 22:04 xiangruz