Create a Tool for ML team to evaluate sampling configurations
We have now a little python script that generates completions for random prompts from our dataset with different sampling parameters. The result is stored as a json report. We want to use it to manually compare outputs different base-models and the fine-tuning result models. It would be super-cool if someone from the web team could help us to build a comparison UI that allows to inspect 10+ report files.
The json reports have the following properties:
- each file contains information about the models
- each file contains a list of prompts (which are always identical between the different models, same prompts in the same order),
- for each prompt there are multiple sampling configurations (which are also identical across models per prompt, but NOT across prompts)
- for each sampling configuration we have the same number of outputs.
- for each prompt there are multiple sampling configurations (which are also identical across models per prompt, but NOT across prompts)
Example report files:
https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_eval/manual/sampling_reports
The UI could for example show the following:
sampling-drop down: [beam5]
Prompt: "How to protect my eyes when I have to stare at my computer screen for longer than 10 hours every day?"
theblackcat102/pythia-1b-deduped-sft: "It's not good to stare at your computer screen for longer than 10 hours.",
theblackcat102/pythia-3b-deduped-sft: "1. Open your eyes as wide as possible.\n2. Close your eyes for a few seconds.\n3. Open your eyes again.\n4. Close your eyes for a few seconds.\n5. Open your eyes."
theblackcat102/pythia-12b-deduped-sft: "You should take regular breaks from staring at your computer screen."
.. show results for all the prompts (e.g. 100) ...
I can work on this
Here, have a play: https://johnflux.github.io/Open-Assistant-Model-Comparer/ Code is: https://github.com/johnflux/Open-Assistant-Model-Comparer
Would it be possible to integrate this into our website? more precisely, into our docusaorus docs?
https://github.com/LAION-AI/Open-Assistant/tree/main/docs
https://github.com/Open-Assistant/oasst-model-eval