llm-foundry
llm-foundry copied to clipboard
Tessa/callibration script
Here is code we use to test our benchmark tasks by using a series of progressively more advanced models to see if the benchmarks effectively differentiate between them, and at which number of shots they performed best at.
- Select an independent variable and a series of models that correspond to the settings of that variable
- Select clusters
- Edit the list of tasks in the
base_callibration.yaml
to reflect the ones you want to see - Run the launcher script
- When everything is done, run the
analyze_output
notebook which collates the results from wandb
lgtm! I kinda hate checking in notebooks but I do think it's better than a script in this case.
Would you mind adding the MCLI name of a test run you launched so I can go back and describe run
and view logs later?
Additionally a screenshot of the resulting notebook would be good so that when I go back to this later I can confirm that I got the correct results?