Andreas Köpf comments

Results 365 comments of


                                            Andreas Köpf

Proposal: use distributed network for training

and https://www.together.xyz/

Proposal: use distributed network for training

> I would take inspiration from earlier LAION success and lean more towards distributed pre-processing of training data. There is plenty of cost in things like the reinforcement learning rollouts...

How to prepare 2023-02-12_oasst_prod.jsonl

The OA dataset has not been released. If you want to prepare training code you can look at a sample of 100 English trees here: https://github.com/Open-Assistant/oasst-model-eval/blob/main/model_eval/manual/data/en_100_tree.jsonl If you are interested...

XP thresholds for level progression

Currently the website asks the backend via endpoints `api/v1/users/{id}/stats/{timeframe}` or `api/v1/leaderboards/{timeframe}` ... and uses the `UserScore`/`LeaderboardStats` model class defined in protocol.py The computation of the XP-level could for example be...

ML Overview [temporary coordination issue, will be split up]

I am closing this to reduce confusion since we are effectively following a very different - much simpler plan.

add alpaca gpt4 dataset

I like Jordi's proposal. @CloseChoice do you think you could add this? .. similar to [summarization.py#L151](https://github.com/LAION-AI/Open-Assistant/blob/a570b04b93d1aa591369e41081c571046bbd7d3c/model/model_training/custom_datasets/summarization.py#L151) (just simpler for the different whitespace ..)

Andreas Köpf

Proposal: use distributed network for training

Proposal: use distributed network for training

How to prepare 2023-02-12_oasst_prod.jsonl

XP thresholds for level progression

ML Overview [temporary coordination issue, will be split up]

add alpaca gpt4 dataset

Add instruction to reverse augmentation

Discord bot: Add default model option to .env

Using GPT JT to train Open Assistant

Using GPT JT to train Open Assistant