Jeremy D

Results 14 issues of Jeremy D

This PR is stacked on top of the migration PR https://github.com/mosaicml/llm-foundry/pull/936 It does 5 things 1. Refactor CodeEval and QA tasks to have a shared superclass called InContextLearningGenerationTaskDataset 2. Rename...

Adding Big Bench Hard subset as a set of combined CoT tasks, formatted according to the specification in [this repo](https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main). These tasks are quite large and quite slow. I don't...

Implement F1 score for reference-based grading of QA tasks. This PR is dependent on Max's [refactor](https://github.com/mosaicml/composer/pull/2713) added quac, natural questions, and narrative qa Tested mpt-7b-instruct: ``` | Category | Benchmark...

Brier score seems of questionable usefulness. COPA results: First number for each model is Brier score. Below we find that accuracy AND brier score both go up with model size...