argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[FEATURE] Add conversation support to fields in Argilla dataset (ChatField)

Open dchichkov opened this issue 2 years ago • 11 comments

Is your feature request related to a problem? Please describe. It is not clear how to use Feedback scheme to store multi-turn conversation. It doesn't look like there is support in the web interface to annotate feedback for multi-turn conversations. A lot of Instruction Tuning data is multi-turn and the Feedback scheme only allows to record feedback to a single response. Other schemes (i.e. Text2Text) also seem to not be suitable.

Describe the solution you'd like Native support for multi-turn, allowing to annotate feedback for each turn of the conversation.

Describe alternatives you've considered Getting feedback on each turn individually and then providing context (full conversation) for each turn. This is not great, as this results in the need to re-read each conversation N times, where N is the number of turns. Slowing down the progress substantially.

Alternative is storing the complete conversation as text. And using the external tool, like Gradio to annotate. This diminishes the value of Argilla, as it requires to add external/not integrated tool, disconnected from search / etc. Storing the conversation as text, instead of structured data also reduces the ability to filter the conversation in the structured way (i.e. by last response). Alternative is storing the complete conversation as a .json field. This is also not great, as nearly all tooling of argilla do not have any support for this.

Additional context Ideally this should be natively supported in the web UI. But this may be implemented using a tool/plugin outside the current web/gui. Using Gradio/Chat as a frontend may be a good option. But either way, this still requires support at the schema level.

dchichkov avatar Jul 05 '23 01:07 dchichkov

Hi @dchichkov ! Thanks for the suggestion! I definitely want to know more about your use case. Can I ask if the number of turns is always the same or if it changes for each conversation?

nataliaElv avatar Jul 05 '23 07:07 nataliaElv

Thanks @dchichkov! Looking forward to your comments. I think this is an important field type (or extension to the current text field). I believe I briefly comment this with @alvarobartt when we did the langchain callback. In my experience, the number of turns is variable across records. There several example datasets available on the Hub, but @dchichkov if you are looking at a specific dataset/format it would be awesome if you could give us some pointers

dvsrepo avatar Jul 05 '23 10:07 dvsrepo

Exactly, this was also something we discussed during the ML weekly. Potentially allowing to have something like a TextListField and TextQuestionList

davidberenstein1957 avatar Jul 06 '23 05:07 davidberenstein1957

Hi @nataliaElv & @dvsrepo . An example could be https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K

The format there (this is a single multi-turn conversation) is: [ { "id": "000000033471", "image": "000000033471.jpg", "conversations": [ { "from": "human", "value": "<image>\nWhat are the colors of the bus in the image?" }, { "from": "gpt", "value": "The bus in the image is white and red." }, { "from": "human", "value": "What feature can be seen on the back of the bus?" }, { "from": "gpt", "value": "The back of the bus features an advertisement." }, { "from": "human", "value": "Is the bus driving down the street or pulled off to the side?" }, { "from": "gpt", "value": "The bus is driving down the street, which is crowded with people and other vehicles." } ] }, ...

I'm not sure that this exact format is the best. I think that simply using Markdown with the URL for images is a lot more natural, removes extra complexity and allows more flexibility on the renderer and the dataset side.

dchichkov avatar Jul 06 '23 06:07 dchichkov

In terms of the number of conversations, turns, images, participants, participant names - it varies for every conversation. To put some rough numbers:

  • Conversations: 100k
  • Conversation Turns: 1 ... 50, median 5
  • Turn Length: 1...500 lines of text, median 2
  • Participants: 1 .. 10, median 2
  • Images: 0 .. 7, median 1

dchichkov avatar Jul 06 '23 06:07 dchichkov

+1 for this feature request. I feel like this will be an increasingly important functionality in annotation interfaces.

To give some more input/examples: DeepMind and Anthropic have created similar interfaces for their internal use and it would be great to have an open source option.

  1. In this paper from DeepMind on p. 51 you can see their interface. (Note that this interface also provides snippets for sources, which is not necessary for Argilla I think) Screenshot 2023-07-11 at 14 42 03 Screenshot 2023-07-11 at 14 42 14

  2. in this paper from Anthropic on p. 5 you can see their interface. Screenshot 2023-07-11 at 14 44 26

They've always limited the number of turns to a specific N amount (5~). Another important thing in both these studies is they they have human text input + live model text output (which could come from e.g. the Hugging Face inference API or some other LLM APIs), as well as separate annotation boxes to rate each model output (which probably makes things a bit more complicated).

One open-source implementation for chat-based annotation is Meta's Mephisto/ParlAI's chat interface that directly integrates with MTurk. I haven't tested it yet though. It's based on React and probably requires some JS/React knowledge customize https://github.com/facebookresearch/Mephisto/tree/main/examples/parlai_chat_task_demo

MoritzLaurer avatar Jul 11 '23 13:07 MoritzLaurer

Also +1 for this feature request. When you think of a chatbot type response with LLMs, we quickly end up with multi-turn conversations.

To add a bit more colour on what sort of feedback to log on a multi-turn conversation may be useful:

  1. Conversation-level metrics - a 1-5 rating, flags, classifiers, etc at the conversation level. This is very similar to the current demos and the feedback is captured for the complete conversation block
  2. Conversation section feedback - for example one response from gpt says 'To do that you need to do ... XYZ'. At this level it will be helpful to log metrics (ratings, flags, thumbs-up-down, etc). Also at this level it is helpful for the human anotator to be able to correct or rewrite the response.

Thanks!

ttamg avatar Jul 13 '23 06:07 ttamg

@MoritzLaurer @ttamg, thanks for the context. Any feedback and context are always welcome.

davidberenstein1957 avatar Aug 07 '23 10:08 davidberenstein1957

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] avatar Nov 06 '23 01:11 github-actions[bot]

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] avatar Jun 30 '24 01:06 github-actions[bot]

@dchichkov @ttamg @MoritzLaurer We're currently working towards a ChatField for Argilla datasets that would look similar to the output of our current chat_to_html helper function. Have any of you tried it? If so, I'd love to learn about your experience with it. If you're happy to be contacted drop me an email at [email protected]

nataliaElv avatar Aug 02 '24 08:08 nataliaElv