kge
kge copied to clipboard
Add loss/cost on validation data https://github.com/uma-pi1/kge/issues/2
As title says, this PR adds loss/cost on validation data to address https://github.com/uma-pi1/kge/issues/2 . Reuse the relevant parts of TrainingJob to run the train job on validation data. This also brings the additional feature that TrainJobs can now run on arbitrary splits; all dependencies on the train split are removed and only determined by the necessary arguments that are propagated down.
Why is this needed?
We already have the capability to run a training job on arbitrary splits and to select the splits for filtering (in case of negative sampling). E.g., one may simply run a training job using "train.split=valid".
What may be missing is (1) disable the optimizer, (2) just run one epoch, and (3) disable checkpoints. Is this what this PR is doing?
We already have the capability to run a training job on arbitrary splits and to select the splits for filtering (in case of negative sampling). E.g., one may simply run a training job using "train.split=valid".
What may be missing is (1) disable the optimizer, (2) just run one epoch, and (3) disable checkpoints. Is this what this PR is doing?
Yes, there are so many things that have to be disabled (disable optimzer, disable backward, no valid job instantiation, everything in .run() basically, ...) just to run it as a new instance of TrainingJob. I found it to be more sensible to call run_epoch from the valid_job, that has access to its parent_job anyhow. It works very nicely and, of couse, I tested it that it gives the same results as before for all TrainingJobs.
I see & sounds good. Can you very briefly outline the approach/changes? That simplifies the code review.
I do think that we should create a new instance of the training job from the valid job though (so that it can have a different config). That's cleaner and more future-proof than changing code in the training jobs.
This would also allow its use from an arbitrary eval job (not necessarily valid)
This would also allow its use from an arbitrary eval job (not necessarily valid)
I agree. However, I started with an approach where the TrainingJob was acting depending on a flag in but that made it so ugly and hard to read with "if"s everywhere. So I ended up with the current solution. I think about if there is another way, because all we care about is run_epoch(). In the current approach it has the signature
trace_entry = self.run_epoch(
split=self.train_split, echo_trace=True, do_backward=True
)
That's fine, but you can still create a new job during valid. Then just call run_epoch
. The first two arguments are not needed (1st = part of job, 2nd = tresult of run_epoch). I'd rename the third argument to forward_only
.
Should the training job just produce a seperate trace entry and echo it as well? I think its nicer when we retrieve the trace result and do
metrics.update(
avg_loss=train_trace_entry["avg_loss"],
avg_penalty=train_trace_entry["avg_penalty"],
avg_penalties=train_trace_entry["avg_penalties"],
avg_cost=train_trace_entry["avg_cost"],
)
WDYT?
You mean the evaluation job? I'd only copy the avg_loss, the other entries are depending on config settings and do not actually evaluate the model. They can still be found in the trace entry of the subtraining job if needed.
You mean the evaluation job? I'd only copy the avg_loss, the other entries are depending on config settings and do not actually evaluate the model. They can still be found in the trace entry of the subtraining job if needed.
Ok. Should the subtraining job echo its trace as well?
My current plan is:
create
__init_for_train__
and
__init_for_eval__
which are branched to in __init__
and
run_for_train
and
run_for_eval
which are branched into from run()
. The run_epoch()
trace can be retrieved with a hook.
WDYT
Is this all needed? Why not just stick with run_epoch
with flag forward_only
?
To ease things up, we could additionally create an EvalTrainingLossJob: it creates the TrainingJob, calls run_epoch, and does everything else that's needed.
Is this all needed? Why not just stick with
run_epoch
with flagforward_only
?
Ah, did you propose to create the TrainJob but not call run() on it, but only run_epoch? Violates the Job-API a bit, but sure that works.
To ease things up, we could additionally create an EvalTrainingLossJob: it creates the TrainingJob, calls run_epoch, and does everything else that's needed.
What would be the advantage of that?
Advantage: (i) that additional job has our Job API and gathers all the "wrapper code" for reuse in other places. E.g., (ii) it could also be run directly from the config (if we wanted that) and (iii) it can be easily used as parts of other jobs (again, if we wanted that).
For now, key advantage would be (i), not sure if (ii)+(iii) will ever matter.
I am also OK with just using run_epoch for now. It's just that the wrapper job is cleaner, easier to maintain, and probably little additional effort to create.
Alright makes sense and should be easy to do.T
Rainer Gemulla [email protected] schrieb am Mi., 20. Mai 2020, 14:41:
Advantage: (i) that additional job has our Job API and gathers all the "wrapper code" for reuse in other places. E.g., (ii) it could also be run directly from the config (if we wanted that) and (iii) it can be easily used as parts of other jobs (again, if we wanted that).
For now, key advantage would be (i), not sure if (ii)+(iii) will ever matter.
I am also OK with just using run_epoch for now. It's just that the wrapper job is cleaner, easier to maintain, and probably little additional effort to create.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uma-pi1/kge/pull/99#issuecomment-631448686, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMYSKYFJPUJHW3RKXHWCMDRSPFYPANCNFSM4NDCXCEQ .
I wonder if eval.type should not take a list of eval types, runs all of them, and collects their metrics into one trace.
But this runs multiple eval jobs, right? This could be implemented by having an "aggregate eval" job, which does this. Let's keep this for the future.
Can you review again?
I adressed your reviews.
I ended up doing a small revision of EvaluationJob: EvaluationJob.run() is implemented now in EvaluationJob which does standard stuff that has to be done for every EvaluationJob and then calls self._run() (suggestions for better name for self.._run() ?).
I shifted EntityRanking specific stuff from EvaluationJob to there.
I added self.verbose to EvaluationJob and use it.
I made EvalTrainingLossJob an instance variable of EvaluationJob and run it as post_epoch_trace_hook. This can be made nicer when we create AggregatingEvalJob that runs and aggregates a list of EvaluationJobs. Then the post_epoch_trace_hook can be removed from EvaluationJob and can be replaced by the entry "eval.type: entity_ranking, training_loss" in config-default-yaml. See https://github.com/uma-pi1/kge/issues/102
Please also add a CHANGELOG entry before you merge this PR.
Your last changes raise the question of whether all training jobs handle indexes correctly when used in an "eval only" setting. E.g., for KvsAll, what are the labels being used? Training data? Training + eval data?
Your last changes raise the question of whether all training jobs handle indexes correctly when used in an "eval only" setting. E.g., for KvsAll, what are the labels being used? Training data? Training + eval data?
True, for eval only KvsAll should use Training + eval data.
For the other jobs it's clearer, I guess, but I feel taht loss can turn out to be misleading.
I think we should not make the training_loss a default for the evaluation job (i.e., do not add it as a hook). It may be too confusing for a default since it depends on training/setup so much.
For the other jobs it's clearer, I guess, but I feel taht loss can turn out to be misleading.
I think we should not make the training_loss a default for the evaluation job (i.e., do not add it as a hook). It may be too confusing for a default since it depends on training/setup so much.
Actually also for KvsAll its clear, its the same as for all losses. It should only be the labels from eval data, i.e. for unseen labels.
The only particular thing with NS is, that the current implementation does the filtering of positive triples based on what it is given as train.split
. But actually that is always the known KG, i.e., the train split.
Why is it clear for KvsAll? If done like you state, the losses are not comparable between train and valid (valid loss will be pretty much like 1vsAll because the data probably looks like this). Nevertheless this may be the best option. Another sensible choice may be the prefixes/suffixes from eval, but the labels from train+eval.
For negative sampling, the filtering split (or splits?) can already be specified, I think.
For negative sampling, the filtering split (or splits?) can already be specified, I think.
Yes but at the moment it is automagically the split defined as train.split
if not set otherwise. However, train.split
is set to "valid" when TrainingJob is run as evaluation. https://github.com/uma-pi1/kge/blob/f2bcbdb47776e6a3b73d07cca28603bbce723f88/kge/util/sampler.py#L33-L34
This is why one has to set it explicitly, and then the current implementation of NS with filtering didn't cover this case.
For negative sampling, the filtering split (or splits?) can already be specified, I think.
Yes but at the moment it is automagically the split defined as
train.split
if not set otherwise.
So which one to pick here? The most natural choice would be train+valid, but that does not seem to be supported.
Why is it clear for KvsAll? If done like you state, the losses are not comparable between train and valid (valid loss will be pretty much like 1vsAll because the data probably looks like this). Nevertheless this may be the best option. Another sensible choice may be the prefixes/suffixes from eval, but the labels from train+eval.
When we were talking about train+eval, this is what I meant, prefixes from valid, but labels from both. I understand what you mean that the loss on train and the loss on valid are not comparable in that case. With "it is clear" I meant that the only-on-valid KvsAll loss would still work to measure overfitting. Would be nicer if it would be a comparable loss, though.
For negative sampling, the filtering split (or splits?) can already be specified, I think.
Yes but at the moment it is automagically the split defined as
train.split
if not set otherwise.So which one to pick here? The most natural choice would be train+valid, but that does not seem to be supported.
So should we?