Open-Assistant
Open-Assistant copied to clipboard
What is the details for the data used for reward model OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1?
trafficstars
Many thanks for your great open sourcing effort! And I am new to this field and I am particularly interested in the data for training reward model. I noticed that there is a simple dataset config for this, but I am a little bit confused about the details.
datasets:
- oasst_export:
lang: "en,es,de,fr"
input_file_path: 2023-03-27_oasst_research_ready_synth.jsonl.gz
val_split: 0.1
- anthropic_rlhf:
fraction: 0.1
max_val_set: 1000
- shp:
max_val_set: 1000
- hellaswag:
fraction: 0.5
max_val_set: 1000
- webgpt:
val_split: 0.05
max_val_set: 1000
- hf_summary_pairs:
fraction: 0.1
max_val_set: 250
How can we use hellaswag as a comparison dataset? There seem to be multiple choices(rather than 2) Is there any experimental evidence support the fraction setting we are currently using?
Many thanks for any responses in advance!