ExplainaBoard
ExplainaBoard copied to clipboard
multi-lingual evaluation?
As first steps to working on multilingual evaluation, one should:
-
Read the tutorials on implementing new tasks, features, and formats.
-
Get system outputs for different multilingual systems. Here are some potential sources:
- Translation outputs from the WMT shared tasks. These outputs are often available through the WMT metrics task.
- Summarization outputs from the XL-Sum dataset. @pfliu-nlp can help provide these.
- Various analysis tasks from XTREME. These are already imported into ExplainaBoard 1.0, so we can download the data from there. http://explainaboard.nlpedia.ai/leaderboard/xtreme/
-
Run the ExplainaBoard SDK over these tasks and generate reports.
-
Compare the reports across languages. See if we can extract any interesting insights about the cross-lingual variations of trends.
- If so, then dig deeper on these insights or write analysis/visualization code to make it easier to do these comparisons.
- If not, then we can improve the functionality of the ExplainaBoard SDK so that it extracts the features we need to do comparisons.
- More systematically, we might also try correlating being good or bad at particular fine-grained analysis categories with a few things:
- Available training data, for example the size of crawled web corpora such as OSCAR or Wikipedia.
- Linguistic features of languages, or linguistic similarity between transfer and test languages, such as the analysis done in papers on choosing transfer languages or NLP performance prediction.