electra icon indicating copy to clipboard operation
electra copied to clipboard

Training loss

Open DevKretov opened this issue 4 years ago • 7 comments

Hello,

I was wondering whether it is possible to add some loss metrics to the training cycle? The only thing I see during training Electra model is

1275000/3000000 = 42.5%, SPS: 3.1, ELAP: 9:24:02, ETA: 6 days, 11:55:19

which tells nothing about how good is it. I'm trying to add some code to the estimator, but it seems to me that it could be much easier to show all the metrics in order to see how successful the model is at this stage.

I'm training non-English model, so I wanted to get better insight into how my model is performing at the moment.

Thanks

DevKretov avatar Mar 28 '20 08:03 DevKretov

You can always use tensorboard though!

008karan avatar Mar 30 '20 12:03 008karan

@008karan Hi Karan. I am not a TF user. Can you please instruct me how to use tensorboard in this case? I see an tfevents file in the model directory but it seems not to be written for tensorboard. The script used tf.estimator.tpu.TPUEstimator to train the model so I don't know how to extract the loss. Thank you very much in advance!

chriskhanhtran avatar May 01 '20 21:05 chriskhanhtran

I have trained my model on GPU and using tensorboard is similar here. You will find events.out files in your checkpoint folder. Just run tensorboard on it.

008karan avatar May 02 '20 05:05 008karan

@008karan Thank you Karan! It works for me now after I use tensorboard==1.15.0. Do you know how the author can continuously get the evaluation metrics as in this thread? I can only get the evaluation metrics at the end of my training progress.

chriskhanhtran avatar May 02 '20 19:05 chriskhanhtran

In tensorboard you get loss and learning rate here I think you can add whatever you want in logs to see them on tensorboard!

008karan avatar May 03 '20 13:05 008karan

When I trained Electra small on my Spanish corpus the loss was shown if trained on GPU. Now, I got access to TFRC and trained it using its TPUs pod and loss it is not shown. Of course, I can get it from Tensorboard events but would be great to log it by default when running on TPU.

mrm8488 avatar May 06 '20 22:05 mrm8488

You can set the tensorflow log level to info and it will be much more verbose including printing the loss.

nemani avatar May 30 '20 12:05 nemani