Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

How to prepare 2023-02-12_oasst_prod.jsonl

Open ghtaro opened this issue 2 years ago • 3 comments

Hi,

I am trying to run sample scripts in https://github.com/LAION-AI/Open-Assistant/tree/main/model. I would to run RL training script, but I could not prepare oasst dataset specified in config yml.

image

It would be very helpful if you tell me how to do it.

ghtaro avatar Mar 03 '23 01:03 ghtaro

The OA dataset has not been released. If you want to prepare training code you can look at a sample of 100 English trees here: https://github.com/Open-Assistant/oasst-model-eval/blob/main/model_eval/manual/data/en_100_tree.jsonl

If you are interested in helping us to train the RL model please join the OA discord and ping me (or other members of the dev team).

andreaskoepf avatar Mar 05 '23 10:03 andreaskoepf

Are there any open datasets where I can try to train it

huangtao36 avatar Mar 06 '23 07:03 huangtao36

@andreaskoepf Thank you very much for your reply. I managed to run RL training with WebGPT, but will definitely try en_100_tree and visit OA discord!

ghtaro avatar Mar 06 '23 13:03 ghtaro