kge icon indicating copy to clipboard operation
kge copied to clipboard

Add loss/cost on validation data https://github.com/uma-pi1/kge/issues/2

Open samuelbroscheit opened this issue 4 years ago • 42 comments

As title says, this PR adds loss/cost on validation data to address https://github.com/uma-pi1/kge/issues/2 . Reuse the relevant parts of TrainingJob to run the train job on validation data. This also brings the additional feature that TrainJobs can now run on arbitrary splits; all dependencies on the train split are removed and only determined by the necessary arguments that are propagated down.

samuelbroscheit avatar May 16 '20 21:05 samuelbroscheit

Why is this needed?

We already have the capability to run a training job on arbitrary splits and to select the splits for filtering (in case of negative sampling). E.g., one may simply run a training job using "train.split=valid".

What may be missing is (1) disable the optimizer, (2) just run one epoch, and (3) disable checkpoints. Is this what this PR is doing?

rgemulla avatar May 18 '20 09:05 rgemulla

We already have the capability to run a training job on arbitrary splits and to select the splits for filtering (in case of negative sampling). E.g., one may simply run a training job using "train.split=valid".

What may be missing is (1) disable the optimizer, (2) just run one epoch, and (3) disable checkpoints. Is this what this PR is doing?

Yes, there are so many things that have to be disabled (disable optimzer, disable backward, no valid job instantiation, everything in .run() basically, ...) just to run it as a new instance of TrainingJob. I found it to be more sensible to call run_epoch from the valid_job, that has access to its parent_job anyhow. It works very nicely and, of couse, I tested it that it gives the same results as before for all TrainingJobs.

samuelbroscheit avatar May 18 '20 09:05 samuelbroscheit

I see & sounds good. Can you very briefly outline the approach/changes? That simplifies the code review.

rgemulla avatar May 18 '20 09:05 rgemulla

I do think that we should create a new instance of the training job from the valid job though (so that it can have a different config). That's cleaner and more future-proof than changing code in the training jobs.

rgemulla avatar May 18 '20 09:05 rgemulla

This would also allow its use from an arbitrary eval job (not necessarily valid)

rgemulla avatar May 18 '20 09:05 rgemulla

This would also allow its use from an arbitrary eval job (not necessarily valid)

I agree. However, I started with an approach where the TrainingJob was acting depending on a flag in but that made it so ugly and hard to read with "if"s everywhere. So I ended up with the current solution. I think about if there is another way, because all we care about is run_epoch(). In the current approach it has the signature

trace_entry = self.run_epoch(
                split=self.train_split, echo_trace=True, do_backward=True
            )

samuelbroscheit avatar May 18 '20 10:05 samuelbroscheit

That's fine, but you can still create a new job during valid. Then just call run_epoch. The first two arguments are not needed (1st = part of job, 2nd = tresult of run_epoch). I'd rename the third argument to forward_only .

rgemulla avatar May 18 '20 10:05 rgemulla

Should the training job just produce a seperate trace entry and echo it as well? I think its nicer when we retrieve the trace result and do

            metrics.update(
                avg_loss=train_trace_entry["avg_loss"],
                avg_penalty=train_trace_entry["avg_penalty"],
                avg_penalties=train_trace_entry["avg_penalties"],
                avg_cost=train_trace_entry["avg_cost"],
            )

WDYT?

samuelbroscheit avatar May 20 '20 08:05 samuelbroscheit

You mean the evaluation job? I'd only copy the avg_loss, the other entries are depending on config settings and do not actually evaluate the model. They can still be found in the trace entry of the subtraining job if needed.

rgemulla avatar May 20 '20 09:05 rgemulla

You mean the evaluation job? I'd only copy the avg_loss, the other entries are depending on config settings and do not actually evaluate the model. They can still be found in the trace entry of the subtraining job if needed.

Ok. Should the subtraining job echo its trace as well?

samuelbroscheit avatar May 20 '20 10:05 samuelbroscheit

My current plan is:

create __init_for_train__ and __init_for_eval__

which are branched to in __init__ and

run_for_train and run_for_eval

which are branched into from run(). The run_epoch() trace can be retrieved with a hook.

WDYT

samuelbroscheit avatar May 20 '20 10:05 samuelbroscheit

Is this all needed? Why not just stick with run_epoch with flag forward_only?

To ease things up, we could additionally create an EvalTrainingLossJob: it creates the TrainingJob, calls run_epoch, and does everything else that's needed.

rgemulla avatar May 20 '20 11:05 rgemulla

Is this all needed? Why not just stick with run_epoch with flag forward_only?

Ah, did you propose to create the TrainJob but not call run() on it, but only run_epoch? Violates the Job-API a bit, but sure that works.

To ease things up, we could additionally create an EvalTrainingLossJob: it creates the TrainingJob, calls run_epoch, and does everything else that's needed.

What would be the advantage of that?

samuelbroscheit avatar May 20 '20 12:05 samuelbroscheit

Advantage: (i) that additional job has our Job API and gathers all the "wrapper code" for reuse in other places. E.g., (ii) it could also be run directly from the config (if we wanted that) and (iii) it can be easily used as parts of other jobs (again, if we wanted that).

For now, key advantage would be (i), not sure if (ii)+(iii) will ever matter.

I am also OK with just using run_epoch for now. It's just that the wrapper job is cleaner, easier to maintain, and probably little additional effort to create.

rgemulla avatar May 20 '20 12:05 rgemulla

Alright makes sense and should be easy to do.T

Rainer Gemulla [email protected] schrieb am Mi., 20. Mai 2020, 14:41:

Advantage: (i) that additional job has our Job API and gathers all the "wrapper code" for reuse in other places. E.g., (ii) it could also be run directly from the config (if we wanted that) and (iii) it can be easily used as parts of other jobs (again, if we wanted that).

For now, key advantage would be (i), not sure if (ii)+(iii) will ever matter.

I am also OK with just using run_epoch for now. It's just that the wrapper job is cleaner, easier to maintain, and probably little additional effort to create.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uma-pi1/kge/pull/99#issuecomment-631448686, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMYSKYFJPUJHW3RKXHWCMDRSPFYPANCNFSM4NDCXCEQ .

samuelbroscheit avatar May 20 '20 13:05 samuelbroscheit

I wonder if eval.type should not take a list of eval types, runs all of them, and collects their metrics into one trace.

samuelbroscheit avatar May 22 '20 09:05 samuelbroscheit

But this runs multiple eval jobs, right? This could be implemented by having an "aggregate eval" job, which does this. Let's keep this for the future.

rgemulla avatar May 22 '20 10:05 rgemulla

Can you review again?

samuelbroscheit avatar May 22 '20 12:05 samuelbroscheit

I adressed your reviews.

I ended up doing a small revision of EvaluationJob: EvaluationJob.run() is implemented now in EvaluationJob which does standard stuff that has to be done for every EvaluationJob and then calls self._run() (suggestions for better name for self.._run() ?).

I shifted EntityRanking specific stuff from EvaluationJob to there.

I added self.verbose to EvaluationJob and use it.

I made EvalTrainingLossJob an instance variable of EvaluationJob and run it as post_epoch_trace_hook. This can be made nicer when we create AggregatingEvalJob that runs and aggregates a list of EvaluationJobs. Then the post_epoch_trace_hook can be removed from EvaluationJob and can be replaced by the entry "eval.type: entity_ranking, training_loss" in config-default-yaml. See https://github.com/uma-pi1/kge/issues/102

samuelbroscheit avatar May 23 '20 00:05 samuelbroscheit

Please also add a CHANGELOG entry before you merge this PR.

rgemulla avatar May 25 '20 09:05 rgemulla

Your last changes raise the question of whether all training jobs handle indexes correctly when used in an "eval only" setting. E.g., for KvsAll, what are the labels being used? Training data? Training + eval data?

rgemulla avatar May 25 '20 12:05 rgemulla

Your last changes raise the question of whether all training jobs handle indexes correctly when used in an "eval only" setting. E.g., for KvsAll, what are the labels being used? Training data? Training + eval data?

True, for eval only KvsAll should use Training + eval data.

samuelbroscheit avatar May 25 '20 13:05 samuelbroscheit

For the other jobs it's clearer, I guess, but I feel taht loss can turn out to be misleading.

I think we should not make the training_loss a default for the evaluation job (i.e., do not add it as a hook). It may be too confusing for a default since it depends on training/setup so much.

rgemulla avatar May 25 '20 17:05 rgemulla

For the other jobs it's clearer, I guess, but I feel taht loss can turn out to be misleading.

I think we should not make the training_loss a default for the evaluation job (i.e., do not add it as a hook). It may be too confusing for a default since it depends on training/setup so much.

Actually also for KvsAll its clear, its the same as for all losses. It should only be the labels from eval data, i.e. for unseen labels.

The only particular thing with NS is, that the current implementation does the filtering of positive triples based on what it is given as train.split. But actually that is always the known KG, i.e., the train split.

samuelbroscheit avatar May 25 '20 19:05 samuelbroscheit

Why is it clear for KvsAll? If done like you state, the losses are not comparable between train and valid (valid loss will be pretty much like 1vsAll because the data probably looks like this). Nevertheless this may be the best option. Another sensible choice may be the prefixes/suffixes from eval, but the labels from train+eval.

rgemulla avatar May 25 '20 19:05 rgemulla

For negative sampling, the filtering split (or splits?) can already be specified, I think.

rgemulla avatar May 25 '20 19:05 rgemulla

For negative sampling, the filtering split (or splits?) can already be specified, I think.

Yes but at the moment it is automagically the split defined as train.split if not set otherwise. However, train.split is set to "valid" when TrainingJob is run as evaluation. https://github.com/uma-pi1/kge/blob/f2bcbdb47776e6a3b73d07cca28603bbce723f88/kge/util/sampler.py#L33-L34

This is why one has to set it explicitly, and then the current implementation of NS with filtering didn't cover this case.

samuelbroscheit avatar May 25 '20 20:05 samuelbroscheit

For negative sampling, the filtering split (or splits?) can already be specified, I think.

Yes but at the moment it is automagically the split defined as train.split if not set otherwise.

So which one to pick here? The most natural choice would be train+valid, but that does not seem to be supported.

rgemulla avatar May 26 '20 09:05 rgemulla

Why is it clear for KvsAll? If done like you state, the losses are not comparable between train and valid (valid loss will be pretty much like 1vsAll because the data probably looks like this). Nevertheless this may be the best option. Another sensible choice may be the prefixes/suffixes from eval, but the labels from train+eval.

When we were talking about train+eval, this is what I meant, prefixes from valid, but labels from both. I understand what you mean that the loss on train and the loss on valid are not comparable in that case. With "it is clear" I meant that the only-on-valid KvsAll loss would still work to measure overfitting. Would be nicer if it would be a comparable loss, though.

samuelbroscheit avatar May 26 '20 11:05 samuelbroscheit

For negative sampling, the filtering split (or splits?) can already be specified, I think.

Yes but at the moment it is automagically the split defined as train.split if not set otherwise.

So which one to pick here? The most natural choice would be train+valid, but that does not seem to be supported.

So should we?

samuelbroscheit avatar May 26 '20 11:05 samuelbroscheit