chainlit Flows to collect preference tuning data (A/B Testing)

Flows to collect preference tuning data (A/B Testing)

Open chughtapan opened this issue 6 months ago • 0 comments

trafficstars

Is your feature request related to a problem? Please describe. It is a common practice when developing generative AI applications to serve different responses to a user and collect which one is better. This can be used in various ways for e.g., to do offline evals, learning reward models and doing preference fine-tuning, etc. I'm currently using chainlit as demo for my agent, but I'm not sure if there is any way to do so and collect the data.

Describe the solution you'd like Maybe just create a new message type (e.g., SplitMessage) or something, where both components can be updated independently. Would also need a way for the user to choose if they prefer A or B, and then the chat history can only store that and discard the other one.

Describe alternatives you've considered I don't think there's any alternatives currently which can offer this easily with Chainlit right now. I think lmsys uses a gradio based ui.

May 19 '25 22:05 chughtapan

chainlit chainlit copied to clipboard

Flows to collect preference tuning data (A/B Testing)

chainlit
chainlit copied to clipboard