glove-python icon indicating copy to clipboard operation
glove-python copied to clipboard

Add a tracking of global loss in each epoch

Open ducovrossem opened this issue 10 years ago • 8 comments

Something to look at while training the model :)

global_loss += 0.5 * entry_weight * (prediction - c_log(count)) **2

ducovrossem avatar Dec 19 '14 10:12 ducovrossem

Makes sense to track the loss. A couple of comments:

  1. Why not just use a primitive double that's initialized in Cython, and then maybe returned from the fit_vectors function? We could then avoid the array + instance attribute approach.
  2. We should only print the loss if verbose == True.
  3. If you are using more than one thread, the loss will be approximate (as the threads will be writing over each other).

maciejkula avatar Dec 19 '14 11:12 maciejkula

  1. Changed it around. Had thought about your proposed structure but went with the previous method only because it seemed less of a re-write. Why avoid the the array + instance approach?
  2. Added.
  3. One could initiate a global_loss instance per no_threads and add them at the end... I am not sure how much of a detail this is.

ducovrossem avatar Dec 19 '14 17:12 ducovrossem

It just seems strange to have a one-element array when the same purpose can be served by just having a single number.

Regarding the multithreading issue: I don't think this is a huge problem, just wanted to point out that that the loss numbers can be non-deterministic.

maciejkula avatar Dec 23 '14 11:12 maciejkula

I was going to go in sneakily after this is merged and do that :)

@ducovrossem as Radim pointed out, it will be more efficient to only calculate the (prediction - c_log(count)) portion once and then re-use it in the expressions for loss and global_loss.

As for the 0.5 factor: because the expression for the gradient of the loss does not have a 2 * entry_weight scaling factor it implies that the original loss was 0.5 * (x - y). As far as I know this is quite common, because it allows us to drop the 2 constant in the derivation.

maciejkula avatar Dec 23 '14 12:12 maciejkula

Actually, why not factor out log(count) completely, out of the main loop, into the cooccurrence matrix? In other words, do cooccurrence_matrix = log(cooccurrence_matrix).

Or are the actual original counts/weights needed anywhere else, apart from log(count)?

Re. 0.5 * makes sense, thanks!

piskvorky avatar Dec 24 '14 16:12 piskvorky

The raw value is used for the weighting (and I also quite like the fact that the co-occurrence matrix is marginally model agnostic and could conceivably support a different application).

In general there is a fair amount of things that can still be factored out to just happen once (look at the bias updates for instance). I'll probably do a pass soon and get those out of the way.

maciejkula avatar Dec 24 '14 20:12 maciejkula

I know this pull request has gotten stale but is there any interest in getting it merged? I've managed to merge it into master locally and would be willing to fork and open a new pull request where we can discuss it. Three years have passed but it still looks like a valuable addition!

IronFarm avatar Jan 03 '18 10:01 IronFarm

It'd be really great to add this feature!

gokceneraslan avatar Nov 19 '18 21:11 gokceneraslan