Bobak Hashemi comments

Results 20 comments of


                                            Bobak Hashemi

Train a reward model based on RankGen

Hey few Qs, 1. any suggestions for how to compare with the other tasks? I'm thinking we might want to make comparing our pretrained models it's own issue since all...

Train a reward model based on RankGen

Added draft PR, mainly working out of [this notebook](https://github.com/LAION-AI/Open-Assistant/blob/29b4ab533d1202fc41423af675211fcfaebe6aad/model/reward/rankgen/proto.ipynb) for now:

Train a reward model based on RankGen

As per discussion with @theblackcat102 I decided to build this on top of their trainer. The code for this is in the new PR. I am also training the model...

Try Supervised Fine-Tuning on pseudo-QA-data

Are we planning to use an encoder-decoder model like t5 or a decoder only model (or is this irrelevant at this stage)? I'm interested in understanding the choice of T5...

Try Supervised Fine-Tuning on pseudo-QA-data

> The reason why I thought starting with T5 would be a good idea is that Flan-T5 outperforms OPT-IML. Also, once we have the training codebase available, we will be...

[WIP] Train Rankgen ranking model for RLHF

@theblackcat102 Yeah, I noticed there was a lot of overfitting. In the past using these contrastive style losses I normally clip them, but looking at your code it doesn't seem...

[WIP] Train Rankgen ranking model for RLHF

Closing, superseded by https://github.com/LAION-AI/Open-Assistant/pull/313

Bth5032/78 blackcat trainer

> @bth5032 thank you! could you run `pre-commit run --all-files` to make linters happy? Thanks! > Not trying to nickpick anyway, but just curious why not assign the default value...

Get model evaluation working on the reward model trainer

Update, it seems like this issue is due to rankgen model not returning labels [here](https://github.com/LAION-AI/Open-Assistant/blob/main/model/reward/instructor/trainer.py#L32). To fix this issue simply return the labels (the label is [always 0](https://github.com/LAION-AI/Open-Assistant/blob/main/model/reward/instructor/trainer.py#L114), this is...

Get model evaluation working on the reward model trainer

@jackapbutler Yes, to the best of my knowledge this task is still open, but I'm not sure if the code in the repo is up to date. @theblackcat102 would know...