Bug (?) - evaluation returns high scores when NaN-values are returned.
Hello! First of all, thank you for a great package. I have started using it to benchmark some models, but I think I have found a potential issue. As the the description states, if there are NaN-values produced during the evaluation, the model in question will produce high scores, which of course can be very misleading during an evaluation.
A way to reproduce this:
- Train a TransE-model, e.g., with this configuration file:
job.type: train
dataset.name: wnrr
train:
optimizer: Adagrad
optimizer_args:
lr: 0.2
valid:
every: 5
metric: mean_reciprocal_rank_filtered
model: transe
lookup_embedder:
dim: 100
regularize_weight: 0.8e-7
- Train it and arrive at a model.
- Change the code in
transe.pyon lines 22-23, from
elif combine == "_po":
out = -torch.cdist(o_emb - p_emb, s_emb, p=self._norm)
to
elif combine == "_po":
out = -torch.cdist(o_emb/0, s_emb, p=self._norm)
This will give scores > 0.5 for all metrics, which is problematic of course. I know this is incorrect of course; this is not what I did when I discovered it but it is a simple example that shows can happens.
I think a callback during evaluation checking that no values are nan is perhaps in its place?
Thank you!
Thanks & yes, this sounds like a good idea and should probably directly integrated into the evaluation code. Are you willing to do a PR? It may suffice to only throw an error if the score of the correct triple is NaN (which is, I guess, the reason for this problem).
Yes, I will have a look at it. I think further tests should also be built, I might have a look at that if I get the time.
Great, thanks!