alignment-handbook
alignment-handbook copied to clipboard
Question about AI Feedback (AIF)
In the AI Feedback (AIF) phase, with GPT-4 serving as the teacher model,I am curious to know if there might be a propensity for GPT-4 to assign higher ratings to its own outputs?
Additionally, I am interested in the statistical distribution of various large language models chosen as ${y_w}$ during the AI Feedback (AIF) evaluation in your study. Have you conducted an analysis on how frequently different LLMs were selected for this purpose?
Thank you!