llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Verify icl cfgs

Open bmosaicml opened this issue 1 year ago • 0 comments

Confirmed fp16 is slightly better than bf16. I also edited the eval script to be compute averages across benchmarks with sub scores and log the table results in markdown format.

Run with amp_fp16:

Benchmark Subcategory Accuracy Number few shot Model
jeopardy Average 0.279767 0 mosaicml/mpt-7b
american_history 0.365617 0 mosaicml/mpt-7b
literature 0.318367 0 mosaicml/mpt-7b
science 0.138655 0 mosaicml/mpt-7b
word_origins 0.115068 0 mosaicml/mpt-7b
world_history 0.461126 0 mosaicml/mpt-7b
lambada_openai 0.70328 0 mosaicml/mpt-7b
piqa 0.799238 0 mosaicml/mpt-7b
hellaswag 0.761701 0 mosaicml/mpt-7b
arc_easy 0.67298 0 mosaicml/mpt-7b
arc_challenge 0.396758 0 mosaicml/mpt-7b
copa 0.8 0 mosaicml/mpt-7b
boolq 0.748012 0 mosaicml/mpt-7b
mmlu Average 0.293343 0 mosaicml/mpt-7b
abstract_algebra 0.32 0 mosaicml/mpt-7b
anatomy 0.355556 0 mosaicml/mpt-7b
astronomy 0.256579 0 mosaicml/mpt-7b
business_ethics 0.28 0 mosaicml/mpt-7b
clinical_knowledge 0.30566 0 mosaicml/mpt-7b
college_biology 0.305556 0 mosaicml/mpt-7b
college_chemistry 0.21 0 mosaicml/mpt-7b
college_computer_science 0.27 0 mosaicml/mpt-7b
college_mathematics 0.27 0 mosaicml/mpt-7b
college_medicine 0.289017 0 mosaicml/mpt-7b
college_physics 0.22549 0 mosaicml/mpt-7b
computer_security 0.33 0 mosaicml/mpt-7b
conceptual_physics 0.251064 0 mosaicml/mpt-7b
econometrics 0.263158 0 mosaicml/mpt-7b
electrical_engineering 0.310345 0 mosaicml/mpt-7b
elementary_mathematics 0.304233 0 mosaicml/mpt-7b
formal_logic 0.277778 0 mosaicml/mpt-7b
global_facts 0.34 0 mosaicml/mpt-7b
high_school_biology 0.306452 0 mosaicml/mpt-7b
high_school_chemistry 0.285714 0 mosaicml/mpt-7b
high_school_computer_science 0.33 0 mosaicml/mpt-7b
high_school_european_history 0.254545 0 mosaicml/mpt-7b
high_school_geography 0.333333 0 mosaicml/mpt-7b
high_school_government_and_politics 0.321244 0 mosaicml/mpt-7b
high_school_macroeconomics 0.266667 0 mosaicml/mpt-7b
high_school_mathematics 0.240741 0 mosaicml/mpt-7b
high_school_microeconomics 0.268908 0 mosaicml/mpt-7b
high_school_physics 0.271523 0 mosaicml/mpt-7b
high_school_psychology 0.286239 0 mosaicml/mpt-7b
high_school_statistics 0.185185 0 mosaicml/mpt-7b
high_school_us_history 0.308824 0 mosaicml/mpt-7b
high_school_world_history 0.2827 0 mosaicml/mpt-7b
human_aging 0.246637 0 mosaicml/mpt-7b
human_sexuality 0.274809 0 mosaicml/mpt-7b
international_law 0.413223 0 mosaicml/mpt-7b
jurisprudence 0.324074 0 mosaicml/mpt-7b
logical_fallacies 0.368098 0 mosaicml/mpt-7b
machine_learning 0.267857 0 mosaicml/mpt-7b
management 0.320388 0 mosaicml/mpt-7b
marketing 0.324786 0 mosaicml/mpt-7b
medical_genetics 0.26 0 mosaicml/mpt-7b
miscellaneous 0.374202 0 mosaicml/mpt-7b
moral_disputes 0.315029 0 mosaicml/mpt-7b
moral_scenarios 0.248045 0 mosaicml/mpt-7b
nutrition 0.310458 0 mosaicml/mpt-7b
philosophy 0.33119 0 mosaicml/mpt-7b
prehistory 0.345679 0 mosaicml/mpt-7b
professional_accounting 0.276596 0 mosaicml/mpt-7b
professional_law 0.290091 0 mosaicml/mpt-7b
professional_medicine 0.205882 0 mosaicml/mpt-7b
professional_psychology 0.289216 0 mosaicml/mpt-7b
public_relations 0.272727 0 mosaicml/mpt-7b
security_studies 0.236735 0 mosaicml/mpt-7b
sociology 0.293532 0 mosaicml/mpt-7b
us_foreign_policy 0.35 0 mosaicml/mpt-7b
virology 0.277108 0 mosaicml/mpt-7b
world_religions 0.397661 0 mosaicml/mpt-7b
winograd 0.868132 0 mosaicml/mpt-7b
winogrande 0.685083 0 mosaicml/mpt-7b
triviaqa 0.343057 0 mosaicml/mpt-7b

Run with amp_bf16:

Benchmark Subcategory Accuracy Number few shot Model
jeopardy Average 0.273737 0 mosaicml/mpt-7b
american_history 0.355932 0 mosaicml/mpt-7b
literature 0.308163 0 mosaicml/mpt-7b
science 0.136555 0 mosaicml/mpt-7b
word_origins 0.109589 0 mosaicml/mpt-7b
world_history 0.458445 0 mosaicml/mpt-7b
lambada_openai 0.686202 0 mosaicml/mpt-7b
piqa 0.799238 0 mosaicml/mpt-7b
hellaswag 0.762199 0 mosaicml/mpt-7b
arc_easy 0.673401 0 mosaicml/mpt-7b
arc_challenge 0.391638 0 mosaicml/mpt-7b
copa 0.8 0 mosaicml/mpt-7b
boolq 0.739144 0 mosaicml/mpt-7b
mmlu Average 0.292015 0 mosaicml/mpt-7b
abstract_algebra 0.3 0 mosaicml/mpt-7b
anatomy 0.407407 0 mosaicml/mpt-7b
astronomy 0.269737 0 mosaicml/mpt-7b
business_ethics 0.3 0 mosaicml/mpt-7b
clinical_knowledge 0.309434 0 mosaicml/mpt-7b
college_biology 0.319444 0 mosaicml/mpt-7b
college_chemistry 0.2 0 mosaicml/mpt-7b
college_computer_science 0.25 0 mosaicml/mpt-7b
college_mathematics 0.25 0 mosaicml/mpt-7b
college_medicine 0.294798 0 mosaicml/mpt-7b
college_physics 0.215686 0 mosaicml/mpt-7b
computer_security 0.34 0 mosaicml/mpt-7b
conceptual_physics 0.238298 0 mosaicml/mpt-7b
econometrics 0.263158 0 mosaicml/mpt-7b
electrical_engineering 0.317241 0 mosaicml/mpt-7b
elementary_mathematics 0.304233 0 mosaicml/mpt-7b
formal_logic 0.293651 0 mosaicml/mpt-7b
global_facts 0.34 0 mosaicml/mpt-7b
high_school_biology 0.264516 0 mosaicml/mpt-7b
high_school_chemistry 0.295566 0 mosaicml/mpt-7b
high_school_computer_science 0.33 0 mosaicml/mpt-7b
high_school_european_history 0.260606 0 mosaicml/mpt-7b
high_school_geography 0.313131 0 mosaicml/mpt-7b
high_school_government_and_politics 0.310881 0 mosaicml/mpt-7b
high_school_macroeconomics 0.264103 0 mosaicml/mpt-7b
high_school_mathematics 0.255556 0 mosaicml/mpt-7b
high_school_microeconomics 0.273109 0 mosaicml/mpt-7b
high_school_physics 0.264901 0 mosaicml/mpt-7b
high_school_psychology 0.26055 0 mosaicml/mpt-7b
high_school_statistics 0.212963 0 mosaicml/mpt-7b
high_school_us_history 0.269608 0 mosaicml/mpt-7b
high_school_world_history 0.291139 0 mosaicml/mpt-7b
human_aging 0.273543 0 mosaicml/mpt-7b
human_sexuality 0.290076 0 mosaicml/mpt-7b
international_law 0.363636 0 mosaicml/mpt-7b
jurisprudence 0.351852 0 mosaicml/mpt-7b
logical_fallacies 0.300613 0 mosaicml/mpt-7b
machine_learning 0.285714 0 mosaicml/mpt-7b
management 0.300971 0 mosaicml/mpt-7b
marketing 0.376068 0 mosaicml/mpt-7b
medical_genetics 0.32 0 mosaicml/mpt-7b
miscellaneous 0.355045 0 mosaicml/mpt-7b
moral_disputes 0.320809 0 mosaicml/mpt-7b
moral_scenarios 0.240223 0 mosaicml/mpt-7b
nutrition 0.300654 0 mosaicml/mpt-7b
philosophy 0.315113 0 mosaicml/mpt-7b
prehistory 0.345679 0 mosaicml/mpt-7b
professional_accounting 0.27305 0 mosaicml/mpt-7b
professional_law 0.260104 0 mosaicml/mpt-7b
professional_medicine 0.224265 0 mosaicml/mpt-7b
professional_psychology 0.295752 0 mosaicml/mpt-7b
public_relations 0.3 0 mosaicml/mpt-7b
security_studies 0.2 0 mosaicml/mpt-7b
sociology 0.283582 0 mosaicml/mpt-7b
us_foreign_policy 0.32 0 mosaicml/mpt-7b
virology 0.259036 0 mosaicml/mpt-7b
world_religions 0.409357 0 mosaicml/mpt-7b
winograd 0.868132 0 mosaicml/mpt-7b
winogrande 0.685872 0 mosaicml/mpt-7b
triviaqa 0.336781 0 mosaicml/mpt-7b

bmosaicml avatar Jun 09 '23 15:06 bmosaicml