salman
salman
Thanks so much for your feedback @kartikayk. I think it makes sense to start with the reward model implementation. There's a pre-trained reward model for [Mistral-7B](https://huggingface.co/Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback). Implementing component and model...
I love the support and positivity @kartikayk :) I've put a PR up for a (hopefully) pretty lightweight and non-invasive `TransformerClassifier` implementation. I could use some guidance on numerical testing....
@kartikayk the TransformerClassifier PR is pretty much good to go. Would you still like to collaborate on the RLHF process? There's a lot of steps and I have some design...
Sounds good! Let me know what you're interested in and I can share my thoughts/updates on what I'm working on. Let's chat more on Discord.
It's v3rsachie
I've updated scripts for the rest of the mistral components. I need to write the comparison involving mapping state dicts, update the unit test, and (potentially) add LoRA comparisons.
Okay, all seems good. We now have a unit test for the base `mistral` model using the copied implementation from [the mistral repo](https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/one_file_ref.py). For the unfortunate reviewer seeing my +1160...
not in vain at all - I learnt lots! I've updated and added a README.
Thanks again for your review @ebsmothers :)
Just wanted to chime in to raise a minor point. There's a snag with generation on apple silicon as `torch.isin` is [only available on MPS in torch nightly](https://github.com/pytorch/pytorch/issues/124518). We make...