Open-Assistant
Open-Assistant copied to clipboard
convert scrolls dataset to multipart dialog
The scrolls dataset is interesting because it is a long text dataset. the task is to experiment with breaking the information into a dialog paths, including the final task (qa, summarization, nli). So instruction->answer, instruction->answer, ... final instruction->answer. ideally we would like to prime the assitant to be able to reason over long dialog paths.
https://www.scrolls-benchmark.com/
Is there anywhere I can look up how datasets are supposed to be structured for open-assistant?
I saw the deck discussing the conversation tree data structure, but I'm not sure how we actually want the datasets formatted / structured.
Do we have any examples I can use as a reference?
I assigned to you. You can do it kinda like this:
User: I'm reading a [article|story] and need to summarize it. Can you help me with reading parts of it and then help with the summary?
Here is the introduction: {text} Assistant: I'm happy to help. This introduction is about {either get the summary from the answer or generate it using t5-large} User: I need to also understand this part: {next section} What do you think about ... {generated topic question using question generator-answer} Assistant: This section is above XYZ. In answer to your question about {topic}, {answer} ... User: Now summarize all the parts above into a coheren final summary. Assistant: The final summary is: {actual summary from the scrolls dataset}
You can do variations of these types of dialog paths in order to simulate long range tasks.
These seems to be an issue of the time before OIG was created. Was this task completed as part of OIG?