ExplainaBoard icon indicating copy to clipboard operation
ExplainaBoard copied to clipboard

Document the concept of `statistics` [Discussion]

Open pfliu-nlp opened this issue 2 years ago • 0 comments

I guess statistics would be an important concept throughout the project, making it well-organized and documented will be both good for developers and users. (Also, caching statistics from different scenarios characterizes one valuable point of ExplainaBoard.) General, there are the following scenarios:

Statistics of trainning set

  • Purpose: it's costly to calculate the training set dependent features on the fly, we need a cachable object to store the important statistics of a dataset to fasten the calculation of training set dependent features.
  • Caching strategies:
    • [x] store it in local filesystem (e.g., dataset["train"]._stat)
    • [x] store it in DB (metadata.statistics)
    • [ ] store it in S3 and put the S3 link in DB
  • SDK function
    • [_gen_external_stats](https://github.com/neulab/ExplainaBoard/blob/c27c3391b7090f8be41ac076bc88143ea90623e7/explainaboard/processors/processor.py#L48)

Statistics for Scoring

  • Purpose: for some metrics (e.g, text generation), we usually need to cache some intermediate statistics for each sample (e.g., n-gram overlaps) so that some downstream applications such as non-composable overall evaluation score, confidence interval, or significance test could be made efficiently.
  • Caching strategies:
    • [ ] store it in an in-memory dict? EaaS will generate it and pass it to ExplainaBoard processor?
  • SDK function
    • `[_gen_scoring_stats]'(https://github.com/neulab/ExplainaBoard/blob/c27c3391b7090f8be41ac076bc88143ea90623e7/explainaboard/processors/conditional_generation.py#L109)

Dataset-dependent statistics (from Datalab)

  • Purpose: from a DataLab dataset split, get resources necessary to calculate statistics (This is the original description from @neubig , want to know more about your orginial intention.)
  • Caching:
    • [ ] store it in local filesystem (e.g., dataset["split"]._stat, e.g., dataset["split"][feature_name])
  • SDK function

Overall statistics

  • Purpose: a package of overall statistics information, including performance, of the system output
  • SDK function

Fine-grained statistics

@neubig @OscarWang114 (it seems the definition of the last two statistics is a little bit different from the first two, maybe we could have a better way for naming all of them)

pfliu-nlp avatar Mar 22 '22 20:03 pfliu-nlp