h2ogpt icon indicating copy to clipboard operation
h2ogpt copied to clipboard

ShareGPT evals for various models

Open arnocandel opened this issue 2 years ago • 1 comments

related to https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard but more meaningful scores

https://github.com/h2oai/h2ogpt/blob/ba6cad3207f8319b5c5f4b1e9099d7b909fdb661/generate.py#L1328-L1347

In order, from best to worst, using 500 evals using above test, only choosing correct prompt type for each model, everything else kept the same.

gpt3.5

df_scores_500_500_1234_True_gpt35_

junelee/wizard-vicuna-13b

df_scores_500_500_1234_False_wizard-vicuna-13b_

openaccess-ai-collective/wizard-mega-13b

df_scores_500_500_1234_False_wizard-mega-13b_

ehartford/WizardLM-13B-Uncensored

df_scores_500_500_1234_False_WizardLM-13B-Uncensored_

AlekseyKorshuk/vicuna-7b

df_scores_500_500_1234_False_vicuna-7b_

TheBloke/stable-vicuna-13B-HF

df_scores_500_500_1234_False_stable-vicuna-13B-HF_

ehartford/WizardLM-7B-Uncensored

df_scores_500_500_1234_False_WizardLM-7B-Uncensored_

h2ogpt-oasst1-512-20b

df_scores_500_500_1234_False_h2ogpt-oasst1-512-20b_

h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2

df_scores_500_500_1234_false_h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2__720

h2ogpt-oasst1-512-12b

FIXME - was much better with --num_beams=1 df_scores_500_500_1234_False_h2ogpt-oasst1-512-12b_

h2ogpt-oig-oasst1-512-12b

df_scores_500_500_1234_False_h2ogpt-oig-oasst1-512-12b_

dolly-v2-12b

df_scores_500_500_1234_False_dolly-v2-12b_

h2ogpt-oig-oasst1-512-6.9b

df_scores_500_500_1234_False_h2ogpt-oig-oasst1-512-6 9b_

Lesson: WizardLM is great https://github.com/h2oai/h2ogpt/issues/96

arnocandel avatar May 11 '23 23:05 arnocandel