kge icon indicating copy to clipboard operation
kge copied to clipboard

Bug (?) - evaluation returns high scores when NaN-values are returned.

Open Filco306 opened this issue 4 years ago • 3 comments

Hello! First of all, thank you for a great package. I have started using it to benchmark some models, but I think I have found a potential issue. As the the description states, if there are NaN-values produced during the evaluation, the model in question will produce high scores, which of course can be very misleading during an evaluation.

A way to reproduce this:

  1. Train a TransE-model, e.g., with this configuration file:
job.type: train
dataset.name: wnrr

train:
  optimizer: Adagrad
  optimizer_args:
    lr: 0.2

valid:
  every: 5
  metric: mean_reciprocal_rank_filtered

model: transe
lookup_embedder:
  dim: 100
  regularize_weight: 0.8e-7
  1. Train it and arrive at a model.
  2. Change the code in transe.py on lines 22-23, from
elif combine == "_po":
            out = -torch.cdist(o_emb - p_emb, s_emb, p=self._norm)

to

elif combine == "_po":
            out = -torch.cdist(o_emb/0, s_emb, p=self._norm)

This will give scores > 0.5 for all metrics, which is problematic of course. I know this is incorrect of course; this is not what I did when I discovered it but it is a simple example that shows can happens.

I think a callback during evaluation checking that no values are nan is perhaps in its place?

Thank you!

Filco306 avatar Sep 02 '21 08:09 Filco306

Thanks & yes, this sounds like a good idea and should probably directly integrated into the evaluation code. Are you willing to do a PR? It may suffice to only throw an error if the score of the correct triple is NaN (which is, I guess, the reason for this problem).

rgemulla avatar Sep 02 '21 09:09 rgemulla

Yes, I will have a look at it. I think further tests should also be built, I might have a look at that if I get the time.

Filco306 avatar Sep 02 '21 17:09 Filco306

Great, thanks!

rgemulla avatar Sep 03 '21 09:09 rgemulla