ExplainaBoard All datasets used in ExplainaBoard 1.0 should be supported by DataLab SDK

All datasets used in ExplainaBoard 1.0 should be supported by DataLab SDK

Open pfliu-nlp opened this issue 2 years ago • 1 comments

If necessary, we can introduce the concept of version (e.g., explainaboard) and represent it as sub_dataset_name, for example,

dataset = load_dataset("sst2", "explainaboard")

May 04 '22 04:05 pfliu-nlp

aspect-based-sentiment-classification

[x] laptop14
[x] restaurant14
[x] restaurant16
[x] twitter

chunking

[x] conll00_chunk
[ ] conll03_chunk

word-segmentation

[ ] as
[ ] cityu
[ ] ckip
[ ] ctb
[x] msr
[ ] ncc
[ ] pku
[ ] sxu

named-entity-recognition

[x] conll2003
[ ] conll2000
[ ] ontonotes_ner + notebc
[ ] ontonotes_ner + notebn
[ ] ontonotes_ner + notemz
[ ] ontonotes_ner + notenw
[ ] ontonotes_ner + notetc
[ ] ontonotes_ner + notewb

text-classification

[x] atis
[x] cr
[x] dbpedia_14
[ ] imdb
[x] mr
[x] qc
[x] sst2
[x] sst5
[x] subj

text-pair-classification

[x] snli
[x] sick

text-summarization

[x] cnn_dailymail (we probably need a new version number)
[x] xsum (we probably need a new version number)

XTREME (?)

@neubig do we need to consider it now? I found by using the mapping between DataLab's dataset -> sub_dataset, it would be not very difficult for us to build a composite leaderboard. For example, one simple method is simply to put sub-datasets belonging to the same dataset together with tab or others as a separator.

WMT (?)

May 04 '22 05:05 pfliu-nlp

ExplainaBoard ExplainaBoard copied to clipboard

All datasets used in ExplainaBoard 1.0 should be supported by DataLab SDK

aspect-based-sentiment-classification

chunking

word-segmentation

named-entity-recognition

text-classification

text-pair-classification

text-summarization

XTREME (?)

WMT (?)

ExplainaBoard
ExplainaBoard copied to clipboard