ParlAI icon indicating copy to clipboard operation
ParlAI copied to clipboard

Metrics Usage Issue

Open TristaCao opened this issue 5 years ago • 4 comments

Bug description Metrics specified for eval-model.py do not work for our fine-tuned model - they do not show up in evaluation logs or a report with Parameter ‘-mcs’. Reproduction steps python parlai/scripts/eval_model.py -t blended_skill_talk -mcs bleu -mf tmp/test_train_90M To get the fine-tuned model, we ran this: python examples/train_model.py -t blended_skill_talk -m transformer/generator --multitask-weights 1,3,3,3 --init-model zoo:tutorial_transformer_generator/model --dict-file zoo:tutorial_transformer_generator/model.dict --embedding-size 512 --n-layers 8 --ffn-size 2048 --dropout 0.1 --n-heads 16 --learn-positional-embeddings True --n-positions 512 --variant xlm --activation gelu --skip-generation True --fp16 True --text-truncate 512 --label-truncate 128 --dict-tokenizer bpe --dict-lower True -lr 1e-06 --optimizer adamax --lr-scheduler reduceonplateau --gradient-clip 0.1 -veps 0.25 --betas 0.9,0.999 --update-freq 1 --attention-dropout 0.0 --relu-dropout 0.0 --skip-generation True -vp 15 -stim 60 -vme 20000 -bs 16 -vmt ppl -vmm min --save-after-valid True —model-file tmp/test_train_90M Expected behavior Getting BLEU scores in evaluation logs or report Logs Please paste the command line output:

Dictionary: loading dictionary from tmp/test_train_90M.dict
[ num words =  54944 ]
Total parameters: 87,508,992 (87,508,992 trainable)
[ Loading existing model params from tmp/test_train_90M ]
[ Evaluating task blended_skill_talk using datatype valid. ]
[creating task(s): blended_skill_talk]
[loading parlAI text data:/ParlAI/data/blended_skill_talk/valid.txt]
/ParlAI/parlai/core/torch_generator_agent.py:801: RuntimeWarning: --skip-generation does not produce accurate metrics beyond ppl
  RuntimeWarning,
10s elapsed:
{"%done": "2.69%", "exs": 152, "gpu_mem": 0.06411, "loss": 2.847, "ppl": 17.24, "time_left": "362s", "token_acc": 0.4053, "tpb": 17.03}
20s elapsed:
{"%done": "5.38%", "exs": 304, "gpu_mem": 0.0641, "loss": 2.831, "ppl": 16.97, "time_left": "352s", "token_acc": 0.4039, "tpb": 17.83}

Additional context Would that be possible to refer to a custom metric added to metrics.py function as a new class: for instance,

class new_metric(AverageMetric):
  @staticmethod

What functions do we need to change to call this new metric by specifying "-mcs new_metric"?

TristaCao avatar Jun 18 '20 21:06 TristaCao

To fix your immediate problem, add —skip-generation false.

If you want to add some custom metrics, we should talk more and you should describe what kind of metric you want to add.

stephenroller avatar Jun 19 '20 01:06 stephenroller

I need to better document this but the gist is:

If your metric is good for being computed at the teacher (string) level, then adding a custom_evaluation function to your teacher is a really good idea. Check out #2738 for an example.

If your metric is better computed by the agent (e.g. special loss functions), then there's a different mechanism to do it. Basically in your agent, you should call self.global_metrics.add('metric_name', AverageMetric(1.0)) or similar.

For example, see the "tokens per batch" metric in torch agent: https://github.com/facebookresearch/ParlAI/blob/db7600e498f3217d45f5b2df12dadd1ae2492ad5/parlai/core/torch_agent.py#L1919

stephenroller avatar Jun 19 '20 02:06 stephenroller

We have considered a proposal like @register_metric similar to what you've suggested, but it hasn't been executed.

stephenroller avatar Jun 19 '20 02:06 stephenroller

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

github-actions[bot] avatar Jul 20 '20 00:07 github-actions[bot]

closing in favor of to-be-filed issue regarding @register_metric

klshuster avatar Nov 09 '22 22:11 klshuster