ExplainaBoard
ExplainaBoard copied to clipboard
Document the concept of `statistics` [Discussion]
I guess statistics
would be an important concept throughout the project, making it well-organized and documented will be both good for developers and users.
(Also, caching statistics from different scenarios characterizes one valuable point of ExplainaBoard.)
General, there are the following scenarios:
Statistics of trainning set
- Purpose: it's costly to calculate the training set dependent features on the fly, we need a cachable object to store the important statistics of a dataset to fasten the calculation of training set dependent features.
- Caching strategies:
- [x] store it in local filesystem (e.g., dataset["train"]._stat)
- [x] store it in DB (metadata.statistics)
- [ ] store it in S3 and put the S3 link in DB
- SDK function
-
[_gen_external_stats]
(https://github.com/neulab/ExplainaBoard/blob/c27c3391b7090f8be41ac076bc88143ea90623e7/explainaboard/processors/processor.py#L48)
-
Statistics for Scoring
- Purpose: for some metrics (e.g, text generation), we usually need to cache some intermediate statistics for each sample (e.g., n-gram overlaps) so that some downstream applications such as non-composable overall evaluation score, confidence interval, or significance test could be made efficiently.
- Caching strategies:
- [ ] store it in an in-memory dict? EaaS will generate it and pass it to ExplainaBoard processor?
- SDK function
- `[_gen_scoring_stats]'(https://github.com/neulab/ExplainaBoard/blob/c27c3391b7090f8be41ac076bc88143ea90623e7/explainaboard/processors/conditional_generation.py#L109)
Dataset-dependent statistics (from Datalab)
- Purpose: from a DataLab dataset split, get resources necessary to calculate statistics (This is the original description from @neubig , want to know more about your orginial intention.)
- Caching:
- [ ] store it in local filesystem (e.g., dataset["split"]._stat, e.g., dataset["split"][feature_name])
- SDK function
Overall statistics
- Purpose: a package of overall statistics information, including performance, of the system output
- SDK function
Fine-grained statistics
- Purpose: a package of
_bucketing_samples
results - SDK function
@neubig @OscarWang114 (it seems the definition of the last two statistics is a little bit different from the first two, maybe we could have a better way for naming all of them)