kge icon indicating copy to clipboard operation
kge copied to clipboard

Rewrite entity ranking evaluation to sample sp/po pairs and add loss

Open samuelbroscheit opened this issue 5 years ago • 11 comments

https://github.com/rufex2001/kge/blob/9d83e43f5085e4a0d30d70536e4c1772389907cd/kge/job/entity_ranking.py#L72

samuelbroscheit avatar Apr 02 '19 16:04 samuelbroscheit

Clarification: for probabilistic models, add computation of cross entropy loss to evaluation

rgemulla avatar Apr 02 '19 19:04 rgemulla

On 2nd thought, not sure how useful this is (you'd win by always predicting 1). Samuel, please clarify or close.

rgemulla avatar Apr 04 '19 16:04 rgemulla

How would you win for labels [0,0,1] with prediction [1,1,1] vs prediction [0,0,1] with BCE?

samuelbroscheit avatar Apr 04 '19 16:04 samuelbroscheit

I was referring to the cross entropy of the test triples (all label 1) and their predicted confidence. Anyway, it's not clear to me what "loss" should be added for the evaluation.

(Recall: the evaluation is currently triple based, not sp/po based.).

rgemulla avatar Apr 04 '19 17:04 rgemulla

I think usually you want the same loss that is used during training.

samuelbroscheit avatar Apr 04 '19 17:04 samuelbroscheit

Yes, but during evaluation, we batch triples not so/po pairs. If this is needed, we need a different evaluation job (and another one for negative sampling, I guess).

rgemulla avatar Apr 04 '19 17:04 rgemulla

Or just get the collate func from the Trainer and join it with the eval collate func? Shouldn't be that difficult.

def get_collate_func(trainer_collate_func):
    def my_collate_func(batch):
        my_result = doing stuff
        trainer_result =  trainer_collate_func(batch)
        return my_result, trainer_result
    return my_collate_func

I always implent validation loss automatically, but I am not going to fight for it if you are not convinced.

samuelbroscheit avatar Apr 04 '19 17:04 samuelbroscheit

It's not so easy because (i) the evaluation code has two distinguish two types of things and (ii) we are scoring the same thing multiple times.

The best approach may be: rewrite the evaluation to not use triples but the sp/po approach (i.e., the 1-to-N collate as is). Then compute the MRR/HITS implementation on top of this. In addition to enable the computation of the loss, it should also be faster than our current approach since it computes less scores (e.g., if spo_1 and spo_2 occur in the validation data, we currently compute sp? twice).

rgemulla avatar Apr 05 '19 08:04 rgemulla

To me this is similar to the including test data during model selection: people do it but perhaps they shouldn't. While there is nothing wrong with selecting models based on the standard metrics, perhaps selecting based on loss is better. I'd personally like to see the difference, so if is not too much change, we could keep the option. Then, selecting what is the default behavior is the only required decision.

rufex2001 avatar Apr 05 '19 13:04 rufex2001

No it's different: it's perfectly fine to use the loss on validation data. The problem is that we cannot easily compute it right now without changing the way we implemented validation. I was suggesting to change how our implementation to (1) support loss computation and (2) make it faster.

rgemulla avatar Apr 05 '19 13:04 rgemulla

The validation is currently by far the slowest part in training (when multiple jobs are run in parallel so that the GPU is saturated, most of the time is spent in validation). Addressing this issue should also help here. I'll add a priority tag because of this.

rgemulla avatar Jul 09 '19 21:07 rgemulla