Adding support for constitution
I'm thinking about if it would be possible for users to use a constitution to get the rewards for training. https://www.anthropic.com/constitutional.pdf
Right now I'm experimenting with either using a zero-shot bart multi-class classifier
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
model="facebook/bart-large-mnli")
candidate_labels = ["well-being", "non-judgmental", "empathetic", "tailored", "privacy", "crisis", "ethical"]
classifier(sequence_to_classify, candidate_labels, multi_class=True)
Or using a open domain model.

Here is some example code I made in a branch. https://github.com/lvwerra/trl/commit/d785c2274f365c041f9653a4364da7ff6060aeba
I would love some feedback and I'm sure there is a better way of doing this. So think about the code as inspiration for how this could be achieved.
One thing I thought about that you could probably also do which is simpler is just use the included sentiment analysis pipeline if you wanted positive or negative texts in a style based on a dataset that available on the hub.
Sounds really cool! Have you been able to test it already? If you have a working example then we can add it as an example! This might also be interesting to @lewtun.
Closing this for now - feel free to reopen if there's an update!