Pengfei Liu
Pengfei Liu
If necessary, we can introduce the concept of version (e.g., `explainaboard`) and represent it as `sub_dataset_name`, for example, ``` dataset = load_dataset("sst2", "explainaboard") ```
For example, by running ``` explainaboard --task summarization --custom_dataset_paths ./data/system_outputs/cnndm/cnndm_mini-dataset.tsv --system_outputs ./data/system_outputs/cnndm/cnndm_mini-bart-output.txt --metrics rouge2 ``` it took more than 15 seconds, which I guess is relatively slower?
## The time is almost ripe to consider transferring resources from ExplainaBoard 1.0 to 2.0 ### Purposes: * ExplainaBoard 2.0 could be enhanced by introducing system outputs collected from ExplainaBoard...
## Purpose: * we can provide some out-of-box functions so that users can use them to convert the system output’s format from A to B, where * A: could the...
### Purpose: * ExplainaBoard 2.0 should cover all features that ExplainaBoard 1.0 -XTRME has. ### Methodology * Re-format system outputs: how to format system outputs? * Benchmark leaderboard: How to...
After discussion with Kiril and Haris, the idea of enabling ExplainaBoard to record hyper-parameter features for each task comes out. To achieve this, the potential workflow is: * design a...
It would be nice if users can export figures that could be directly used in their papers based on the generated analysis reports. - [ ] supported by SDK -...
- [ ] NQ (https://ai.google.com/research/NaturalQuestions/dataset) - [ ] TriviaQA (https://nlp.cs.washington.edu/triviaqa/) datasets - [ ] HotpotQA - [ ] DROP
I guess `statistics` would be an important concept throughout the project, making it well-organized and documented will be both good for developers and users. (Also, caching statistics from different scenarios...