llm-foundry
llm-foundry copied to clipboard
Verify icl cfgs
Confirmed fp16 is slightly better than bf16. I also edited the eval script to be compute averages across benchmarks with sub scores and log the table results in markdown format.
Run with amp_fp16:
Benchmark | Subcategory | Accuracy | Number few shot | Model |
---|---|---|---|---|
jeopardy | Average | 0.279767 | 0 | mosaicml/mpt-7b |
american_history | 0.365617 | 0 | mosaicml/mpt-7b | |
literature | 0.318367 | 0 | mosaicml/mpt-7b | |
science | 0.138655 | 0 | mosaicml/mpt-7b | |
word_origins | 0.115068 | 0 | mosaicml/mpt-7b | |
world_history | 0.461126 | 0 | mosaicml/mpt-7b | |
lambada_openai | 0.70328 | 0 | mosaicml/mpt-7b | |
piqa | 0.799238 | 0 | mosaicml/mpt-7b | |
hellaswag | 0.761701 | 0 | mosaicml/mpt-7b | |
arc_easy | 0.67298 | 0 | mosaicml/mpt-7b | |
arc_challenge | 0.396758 | 0 | mosaicml/mpt-7b | |
copa | 0.8 | 0 | mosaicml/mpt-7b | |
boolq | 0.748012 | 0 | mosaicml/mpt-7b | |
mmlu | Average | 0.293343 | 0 | mosaicml/mpt-7b |
abstract_algebra | 0.32 | 0 | mosaicml/mpt-7b | |
anatomy | 0.355556 | 0 | mosaicml/mpt-7b | |
astronomy | 0.256579 | 0 | mosaicml/mpt-7b | |
business_ethics | 0.28 | 0 | mosaicml/mpt-7b | |
clinical_knowledge | 0.30566 | 0 | mosaicml/mpt-7b | |
college_biology | 0.305556 | 0 | mosaicml/mpt-7b | |
college_chemistry | 0.21 | 0 | mosaicml/mpt-7b | |
college_computer_science | 0.27 | 0 | mosaicml/mpt-7b | |
college_mathematics | 0.27 | 0 | mosaicml/mpt-7b | |
college_medicine | 0.289017 | 0 | mosaicml/mpt-7b | |
college_physics | 0.22549 | 0 | mosaicml/mpt-7b | |
computer_security | 0.33 | 0 | mosaicml/mpt-7b | |
conceptual_physics | 0.251064 | 0 | mosaicml/mpt-7b | |
econometrics | 0.263158 | 0 | mosaicml/mpt-7b | |
electrical_engineering | 0.310345 | 0 | mosaicml/mpt-7b | |
elementary_mathematics | 0.304233 | 0 | mosaicml/mpt-7b | |
formal_logic | 0.277778 | 0 | mosaicml/mpt-7b | |
global_facts | 0.34 | 0 | mosaicml/mpt-7b | |
high_school_biology | 0.306452 | 0 | mosaicml/mpt-7b | |
high_school_chemistry | 0.285714 | 0 | mosaicml/mpt-7b | |
high_school_computer_science | 0.33 | 0 | mosaicml/mpt-7b | |
high_school_european_history | 0.254545 | 0 | mosaicml/mpt-7b | |
high_school_geography | 0.333333 | 0 | mosaicml/mpt-7b | |
high_school_government_and_politics | 0.321244 | 0 | mosaicml/mpt-7b | |
high_school_macroeconomics | 0.266667 | 0 | mosaicml/mpt-7b | |
high_school_mathematics | 0.240741 | 0 | mosaicml/mpt-7b | |
high_school_microeconomics | 0.268908 | 0 | mosaicml/mpt-7b | |
high_school_physics | 0.271523 | 0 | mosaicml/mpt-7b | |
high_school_psychology | 0.286239 | 0 | mosaicml/mpt-7b | |
high_school_statistics | 0.185185 | 0 | mosaicml/mpt-7b | |
high_school_us_history | 0.308824 | 0 | mosaicml/mpt-7b | |
high_school_world_history | 0.2827 | 0 | mosaicml/mpt-7b | |
human_aging | 0.246637 | 0 | mosaicml/mpt-7b | |
human_sexuality | 0.274809 | 0 | mosaicml/mpt-7b | |
international_law | 0.413223 | 0 | mosaicml/mpt-7b | |
jurisprudence | 0.324074 | 0 | mosaicml/mpt-7b | |
logical_fallacies | 0.368098 | 0 | mosaicml/mpt-7b | |
machine_learning | 0.267857 | 0 | mosaicml/mpt-7b | |
management | 0.320388 | 0 | mosaicml/mpt-7b | |
marketing | 0.324786 | 0 | mosaicml/mpt-7b | |
medical_genetics | 0.26 | 0 | mosaicml/mpt-7b | |
miscellaneous | 0.374202 | 0 | mosaicml/mpt-7b | |
moral_disputes | 0.315029 | 0 | mosaicml/mpt-7b | |
moral_scenarios | 0.248045 | 0 | mosaicml/mpt-7b | |
nutrition | 0.310458 | 0 | mosaicml/mpt-7b | |
philosophy | 0.33119 | 0 | mosaicml/mpt-7b | |
prehistory | 0.345679 | 0 | mosaicml/mpt-7b | |
professional_accounting | 0.276596 | 0 | mosaicml/mpt-7b | |
professional_law | 0.290091 | 0 | mosaicml/mpt-7b | |
professional_medicine | 0.205882 | 0 | mosaicml/mpt-7b | |
professional_psychology | 0.289216 | 0 | mosaicml/mpt-7b | |
public_relations | 0.272727 | 0 | mosaicml/mpt-7b | |
security_studies | 0.236735 | 0 | mosaicml/mpt-7b | |
sociology | 0.293532 | 0 | mosaicml/mpt-7b | |
us_foreign_policy | 0.35 | 0 | mosaicml/mpt-7b | |
virology | 0.277108 | 0 | mosaicml/mpt-7b | |
world_religions | 0.397661 | 0 | mosaicml/mpt-7b | |
winograd | 0.868132 | 0 | mosaicml/mpt-7b | |
winogrande | 0.685083 | 0 | mosaicml/mpt-7b | |
triviaqa | 0.343057 | 0 | mosaicml/mpt-7b |
Run with amp_bf16:
Benchmark | Subcategory | Accuracy | Number few shot | Model |
---|---|---|---|---|
jeopardy | Average | 0.273737 | 0 | mosaicml/mpt-7b |
american_history | 0.355932 | 0 | mosaicml/mpt-7b | |
literature | 0.308163 | 0 | mosaicml/mpt-7b | |
science | 0.136555 | 0 | mosaicml/mpt-7b | |
word_origins | 0.109589 | 0 | mosaicml/mpt-7b | |
world_history | 0.458445 | 0 | mosaicml/mpt-7b | |
lambada_openai | 0.686202 | 0 | mosaicml/mpt-7b | |
piqa | 0.799238 | 0 | mosaicml/mpt-7b | |
hellaswag | 0.762199 | 0 | mosaicml/mpt-7b | |
arc_easy | 0.673401 | 0 | mosaicml/mpt-7b | |
arc_challenge | 0.391638 | 0 | mosaicml/mpt-7b | |
copa | 0.8 | 0 | mosaicml/mpt-7b | |
boolq | 0.739144 | 0 | mosaicml/mpt-7b | |
mmlu | Average | 0.292015 | 0 | mosaicml/mpt-7b |
abstract_algebra | 0.3 | 0 | mosaicml/mpt-7b | |
anatomy | 0.407407 | 0 | mosaicml/mpt-7b | |
astronomy | 0.269737 | 0 | mosaicml/mpt-7b | |
business_ethics | 0.3 | 0 | mosaicml/mpt-7b | |
clinical_knowledge | 0.309434 | 0 | mosaicml/mpt-7b | |
college_biology | 0.319444 | 0 | mosaicml/mpt-7b | |
college_chemistry | 0.2 | 0 | mosaicml/mpt-7b | |
college_computer_science | 0.25 | 0 | mosaicml/mpt-7b | |
college_mathematics | 0.25 | 0 | mosaicml/mpt-7b | |
college_medicine | 0.294798 | 0 | mosaicml/mpt-7b | |
college_physics | 0.215686 | 0 | mosaicml/mpt-7b | |
computer_security | 0.34 | 0 | mosaicml/mpt-7b | |
conceptual_physics | 0.238298 | 0 | mosaicml/mpt-7b | |
econometrics | 0.263158 | 0 | mosaicml/mpt-7b | |
electrical_engineering | 0.317241 | 0 | mosaicml/mpt-7b | |
elementary_mathematics | 0.304233 | 0 | mosaicml/mpt-7b | |
formal_logic | 0.293651 | 0 | mosaicml/mpt-7b | |
global_facts | 0.34 | 0 | mosaicml/mpt-7b | |
high_school_biology | 0.264516 | 0 | mosaicml/mpt-7b | |
high_school_chemistry | 0.295566 | 0 | mosaicml/mpt-7b | |
high_school_computer_science | 0.33 | 0 | mosaicml/mpt-7b | |
high_school_european_history | 0.260606 | 0 | mosaicml/mpt-7b | |
high_school_geography | 0.313131 | 0 | mosaicml/mpt-7b | |
high_school_government_and_politics | 0.310881 | 0 | mosaicml/mpt-7b | |
high_school_macroeconomics | 0.264103 | 0 | mosaicml/mpt-7b | |
high_school_mathematics | 0.255556 | 0 | mosaicml/mpt-7b | |
high_school_microeconomics | 0.273109 | 0 | mosaicml/mpt-7b | |
high_school_physics | 0.264901 | 0 | mosaicml/mpt-7b | |
high_school_psychology | 0.26055 | 0 | mosaicml/mpt-7b | |
high_school_statistics | 0.212963 | 0 | mosaicml/mpt-7b | |
high_school_us_history | 0.269608 | 0 | mosaicml/mpt-7b | |
high_school_world_history | 0.291139 | 0 | mosaicml/mpt-7b | |
human_aging | 0.273543 | 0 | mosaicml/mpt-7b | |
human_sexuality | 0.290076 | 0 | mosaicml/mpt-7b | |
international_law | 0.363636 | 0 | mosaicml/mpt-7b | |
jurisprudence | 0.351852 | 0 | mosaicml/mpt-7b | |
logical_fallacies | 0.300613 | 0 | mosaicml/mpt-7b | |
machine_learning | 0.285714 | 0 | mosaicml/mpt-7b | |
management | 0.300971 | 0 | mosaicml/mpt-7b | |
marketing | 0.376068 | 0 | mosaicml/mpt-7b | |
medical_genetics | 0.32 | 0 | mosaicml/mpt-7b | |
miscellaneous | 0.355045 | 0 | mosaicml/mpt-7b | |
moral_disputes | 0.320809 | 0 | mosaicml/mpt-7b | |
moral_scenarios | 0.240223 | 0 | mosaicml/mpt-7b | |
nutrition | 0.300654 | 0 | mosaicml/mpt-7b | |
philosophy | 0.315113 | 0 | mosaicml/mpt-7b | |
prehistory | 0.345679 | 0 | mosaicml/mpt-7b | |
professional_accounting | 0.27305 | 0 | mosaicml/mpt-7b | |
professional_law | 0.260104 | 0 | mosaicml/mpt-7b | |
professional_medicine | 0.224265 | 0 | mosaicml/mpt-7b | |
professional_psychology | 0.295752 | 0 | mosaicml/mpt-7b | |
public_relations | 0.3 | 0 | mosaicml/mpt-7b | |
security_studies | 0.2 | 0 | mosaicml/mpt-7b | |
sociology | 0.283582 | 0 | mosaicml/mpt-7b | |
us_foreign_policy | 0.32 | 0 | mosaicml/mpt-7b | |
virology | 0.259036 | 0 | mosaicml/mpt-7b | |
world_religions | 0.409357 | 0 | mosaicml/mpt-7b | |
winograd | 0.868132 | 0 | mosaicml/mpt-7b | |
winogrande | 0.685872 | 0 | mosaicml/mpt-7b | |
triviaqa | 0.336781 | 0 | mosaicml/mpt-7b |