aitom
aitom copied to clipboard
Gum-Net Training not improving with demo data
Hi,
I was testing the Gum-Net and for that used the provided demo data set. After around 30 epochs and around 15 hours of training I stopped it because there is no improvement in the loss function, please see below the logs of the training procedure.
Before finetuning: Rotation error: 1.7350925500744534 +/- 0.6650064011311111 Translation error: 8.442523177067761 +/- 3.44293514383784 ---------- Training Iteration 0 4/4 [==============================] - 1784s 404s/step - loss: 0.8216 Training Iteration 1 4/4 [==============================] - 1781s 405s/step - loss: 0.8218 Training Iteration 2 4/4 [==============================] - 1774s 404s/step - loss: 0.8251 Training Iteration 3 4/4 [==============================] - 1788s 406s/step - loss: 0.8274 Training Iteration 4 4/4 [==============================] - 1783s 405s/step - loss: 0.8334 Training Iteration 5 4/4 [==============================] - 1782s 405s/step - loss: 0.8201 Training Iteration 6 4/4 [==============================] - 1777s 405s/step - loss: 0.8250 Training Iteration 7 4/4 [==============================] - 1797s 407s/step - loss: 0.8310 Training Iteration 8 4/4 [==============================] - 1787s 407s/step - loss: 0.8336 Training Iteration 9 4/4 [==============================] - 1784s 406s/step - loss: 0.8207 Training Iteration 10 4/4 [==============================] - 1787s 406s/step - loss: 0.8258 Training Iteration 11 4/4 [==============================] - 1779s 405s/step - loss: 0.8235 Training Iteration 12 4/4 [==============================] - 1784s 406s/step - loss: 0.8296 Training Iteration 13 4/4 [==============================] - 1773s 402s/step - loss: 0.8271 Training Iteration 14 4/4 [==============================] - 1773s 403s/step - loss: 0.8199 Training Iteration 15 4/4 [==============================] - 1785s 406s/step - loss: 0.8315 Training Iteration 16 4/4 [==============================] - 1789s 407s/step - loss: 0.8264 Training Iteration 17 4/4 [==============================] - 1777s 405s/step - loss: 0.8336 Training Iteration 18 4/4 [==============================] - 1774s 403s/step - loss: 0.8299 Training Iteration 19 4/4 [==============================] - 1790s 407s/step - loss: 0.8303 Training Iteration 20 4/4 [==============================] - 1784s 406s/step - loss: 0.8244 Training Iteration 21 4/4 [==============================] - 1786s 407s/step - loss: 0.8242 Training Iteration 22 4/4 [==============================] - 1789s 406s/step - loss: 0.8245 Training Iteration 23 4/4 [==============================] - 1782s 406s/step - loss: 0.8253 Training Iteration 24 4/4 [==============================] - 1789s 405s/step - loss: 0.8258 Training Iteration 25 4/4 [==============================] - 1784s 406s/step - loss: 0.8238 Training Iteration 26 4/4 [==============================] - 1782s 405s/step - loss: 0.8200 Training Iteration 27 4/4 [==============================] - 1779s 405s/step - loss: 0.8282 Training Iteration 28 4/4 [==============================] - 1780s 405s/step - loss: 0.8251 Training Iteration 29 2/4 [==============>...............] - ETA: 19:00 - loss: 0.8142
Do you have any suggestions or explanation why the training with your demo dataset is not working? I did not change the source code.
Kind regards!
Discussed
I got a similar result. Could you tell me how to solve it? Thank you.
This is my result:
Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8265 Training Iteration 4 Epoch 1/1 100/100 [==============================] - 1387s 14s/step - loss: 0.8275 Training Iteration 5 Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8268 Training Iteration 6 Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8315 Training Iteration 7 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8324 Training Iteration 8 Epoch 1/1 100/100 [==============================] - 1390s 14s/step - loss: 0.8297 Training Iteration 9 Epoch 1/1 100/100 [==============================] - 1390s 14s/step - loss: 0.8312 Training Iteration 10 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8302 Training Iteration 11 Epoch 1/1 100/100 [==============================] - 1387s 14s/step - loss: 0.8302 Training Iteration 12 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8308 Training Iteration 13 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8249 Training Iteration 14 Epoch 1/1 100/100 [==============================] - 1390s 14s/step - loss: 0.8299 Training Iteration 15 Epoch 1/1 100/100 [==============================] - 1388s 14s/step - loss: 0.8318 Training Iteration 16 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8263 Training Iteration 17 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8288 Training Iteration 18 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8289 Training Iteration 19 Epoch 1/1 100/100 [==============================] - 1389s 14s/step - loss: 0.8290
I got a similar result. Could you tell me how to solve it? Thank you.
What we observed is that sometimes on the low SNR dataset (using the pre-trained model), the loss may not be decreasing but if you output the transformation error before and after finetuning, it is improving. Hopefully it helps!