Results 5 issues of Aman Madaan

### Description The report > N sentences where **Sys A** > **Sys B** at sentence-level BLEU will generate wrong output if: a) Sys A never generates sentences that have a...

Hi, Seems like token level averaged NLL loss will be logged for all loss types: https://github.com/atcbosselut/comet-commonsense/blob/070aad114600b36296ef8420325e3d4cef0be470/src/train/train.py#L118 If you can confirm that this is indeed an issue, I can submit a...

### Discussed in https://github.com/google-research/tuning_playbook/discussions/3 Originally posted by **madaan** January 19, 2023 Thanks, the playbook looks pretty cool! I am curious about: > Normalization should be the last operation before the...

Allow formatting functions to be used for creating a prompt dynamically. Currently, prompts are created either by reading from a file or by using prefixes for question/answer. This excludes use...

enhancement

Add details of using `--cached_timestamp` for inference.

documentation