salman comments

Results 37 comments of


                                            salman

[RFC] Proximal Policy Optimisation

Thanks so much for your feedback @kartikayk. I think it makes sense to start with the reward model implementation. There's a pre-trained reward model for [Mistral-7B](https://huggingface.co/Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback). Implementing component and model...

[RFC] Proximal Policy Optimisation

I love the support and positivity @kartikayk :) I've put a PR up for a (hopefully) pretty lightweight and non-invasive `TransformerClassifier` implementation. I could use some guidance on numerical testing....

[RFC] Proximal Policy Optimisation

@kartikayk the TransformerClassifier PR is pretty much good to go. Would you still like to collaborate on the RLHF process? There's a lot of steps and I have some design...

[RFC] Proximal Policy Optimisation

Sounds good! Let me know what you're interested in and I can share my thoughts/updates on what I'm working on. Let's chat more on Discord.

[RFC] Proximal Policy Optimisation

It's v3rsachie

Mistral testing

I've updated scripts for the rest of the mistral components. I need to write the comparison involving mapping state dicts, update the unit test, and (potentially) add LoRA comparisons.

Mistral testing

Okay, all seems good. We now have a unit test for the base `mistral` model using the copied implementation from [the mistral repo](https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/one_file_ref.py). For the unfortunate reviewer seeing my +1160...