llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Tessa/callibration script

Open tbarton16 opened this issue 1 year ago • 2 comments

Here is code we use to test our benchmark tasks by using a series of progressively more advanced models to see if the benchmarks effectively differentiate between them, and at which number of shots they performed best at.

  • Select an independent variable and a series of models that correspond to the settings of that variable
  • Select clusters
  • Edit the list of tasks in the base_callibration.yaml to reflect the ones you want to see
  • Run the launcher script
  • When everything is done, run the analyze_output notebook which collates the results from wandb

tbarton16 avatar Feb 02 '24 23:02 tbarton16

lgtm! I kinda hate checking in notebooks but I do think it's better than a script in this case.

maxisawesome avatar Feb 05 '24 22:02 maxisawesome

Would you mind adding the MCLI name of a test run you launched so I can go back and describe run and view logs later?

Additionally a screenshot of the resulting notebook would be good so that when I go back to this later I can confirm that I got the correct results?

bmosaicml avatar Feb 09 '24 18:02 bmosaicml