tofu
tofu copied to clipboard
The implementation of Truth Ratio and Probability is different from the definition in the paper
Truth Ratio
In the paper, the truth ratio is defined as,
The normalization is defined as,
The code implementation is,
https://github.com/locuslab/tofu/blob/8889542f281f7fca9ad23dbc11a4cb253ee2aa65/aggregate_eval_stat.py#L60-L72
In the code, there are two questions:
- L69, the normalization for the "forget" branch take the minimum of a normalized probability and its reciprocal which doesn't make sense and is different from the paper.
- L64, the mean operation is over the log probs, but the average in the paper is over the probs.
Probability
The probability score for Real Authors and World Facts is defined as the ratio of original probabilities, but in the code (L50-L53) is computed as the ratio of normalized probabilities.
https://github.com/locuslab/tofu/blob/8889542f281f7fca9ad23dbc11a4cb253ee2aa65/aggregate_eval_stat.py#L45-L54
Any help is appreciated!