ExplainaBoard icon indicating copy to clipboard operation
ExplainaBoard copied to clipboard

Add instance IDs to model outputs

Open danieldeutsch opened this issue 4 years ago • 5 comments

It would be really useful if the system outputs had IDs which could be used to match outputs across models. For instance, the CNN/DailyMail outputs are not in the order which I expected, so it's not easy for me to map from my model's outputs to the ones shared by ExplainaBoard.

danieldeutsch avatar Oct 21 '21 14:10 danieldeutsch

Thanks @danieldeutsch , this is a good point. I wonder what the canonical ID would be though. Maybe the sentence number in the original dataset?

neubig avatar Jan 31 '22 15:01 neubig

I think the selection of the ID would need to be dataset-specific. There isn't really an "official" ordering of the CNN/DailyMail dataset as far as I am aware. Each instance corresponds to a filename with a unique ID, and I have seen people people identify instances based on that ID. There may be other datasets for which the instance index is acceptable though.

danieldeutsch avatar Jan 31 '22 17:01 danieldeutsch

Great, thanks!

@pfliu-nlp : I think some sort of traceability like this would be a good idea. Let's think about it.

neubig avatar Jan 31 '22 17:01 neubig

Thank you @danieldeutsch , we're also always thinking about how to introduce the notion of id appropriately. Here the key points I think are (1) not only augment each system output with an id but also (2) map this id to one test dataset that can be regarded as a standar, So far, the new version of explainaboard partially support this, for example squad, where there is already a fix id (question id). Regarding other datasets that there are not official orderings we can define it using datalab, I think (we will gradually reflect this feature)

pfliu-nlp avatar Jan 31 '22 18:01 pfliu-nlp

I think using the datasets IDs when available would be helpful since many people use their dataset readers now. For example, here they have unique IDs for the CNN/DailyMail dataset even though there aren't "official" ones.

danieldeutsch avatar Jan 31 '22 20:01 danieldeutsch