ParlAI Metrics Usage Issue

Bug description Metrics specified for eval-model.py do not work for our fine-tuned model - they do not show up in evaluation logs or a report with Parameter ‘-mcs’. Reproduction steps python parlai/scripts/eval_model.py -t blended_skill_talk -mcs bleu -mf tmp/test_train_90M To get the fine-tuned model, we ran this: python examples/train_model.py -t blended_skill_talk -m transformer/generator --multitask-weights 1,3,3,3 --init-model zoo:tutorial_transformer_generator/model --dict-file zoo:tutorial_transformer_generator/model.dict --embedding-size 512 --n-layers 8 --ffn-size 2048 --dropout 0.1 --n-heads 16 --learn-positional-embeddings True --n-positions 512 --variant xlm --activation gelu --skip-generation True --fp16 True --text-truncate 512 --label-truncate 128 --dict-tokenizer bpe --dict-lower True -lr 1e-06 --optimizer adamax --lr-scheduler reduceonplateau --gradient-clip 0.1 -veps 0.25 --betas 0.9,0.999 --update-freq 1 --attention-dropout 0.0 --relu-dropout 0.0 --skip-generation True -vp 15 -stim 60 -vme 20000 -bs 16 -vmt ppl -vmm min --save-after-valid True —model-file tmp/test_train_90M Expected behavior Getting BLEU scores in evaluation logs or report Logs Please paste the command line output:

Dictionary: loading dictionary from tmp/test_train_90M.dict
[ num words =  54944 ]
Total parameters: 87,508,992 (87,508,992 trainable)
[ Loading existing model params from tmp/test_train_90M ]
[ Evaluating task blended_skill_talk using datatype valid. ]
[creating task(s): blended_skill_talk]
[loading parlAI text data:/ParlAI/data/blended_skill_talk/valid.txt]
/ParlAI/parlai/core/torch_generator_agent.py:801: RuntimeWarning: --skip-generation does not produce accurate metrics beyond ppl
  RuntimeWarning,
10s elapsed:
{"%done": "2.69%", "exs": 152, "gpu_mem": 0.06411, "loss": 2.847, "ppl": 17.24, "time_left": "362s", "token_acc": 0.4053, "tpb": 17.03}
20s elapsed:
{"%done": "5.38%", "exs": 304, "gpu_mem": 0.0641, "loss": 2.831, "ppl": 16.97, "time_left": "352s", "token_acc": 0.4039, "tpb": 17.83}

Additional context Would that be possible to refer to a custom metric added to metrics.py function as a new class: for instance,

class new_metric(AverageMetric):
  @staticmethod

What functions do we need to change to call this new metric by specifying "-mcs new_metric"?

Jun 18 '20 21:06 TristaCao

To fix your immediate problem, add —skip-generation false.

If you want to add some custom metrics, we should talk more and you should describe what kind of metric you want to add.

Jun 19 '20 01:06 stephenroller

I need to better document this but the gist is:

If your metric is good for being computed at the teacher (string) level, then adding a custom_evaluation function to your teacher is a really good idea. Check out #2738 for an example.

If your metric is better computed by the agent (e.g. special loss functions), then there's a different mechanism to do it. Basically in your agent, you should call self.global_metrics.add('metric_name', AverageMetric(1.0)) or similar.

For example, see the "tokens per batch" metric in torch agent: https://github.com/facebookresearch/ParlAI/blob/db7600e498f3217d45f5b2df12dadd1ae2492ad5/parlai/core/torch_agent.py#L1919

Jun 19 '20 02:06 stephenroller

We have considered a proposal like @register_metric similar to what you've suggested, but it hasn't been executed.

Jun 19 '20 02:06 stephenroller

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

Jul 20 '20 00:07 github-actions[bot]

closing in favor of to-be-filed issue regarding @register_metric

Nov 09 '22 22:11 klshuster