Open-Assistant
Open-Assistant copied to clipboard
Constructing Wikihow for QA with Metadata and Different Response Format
This issue is to explore the possibility and necessity of building QA dataset based on WikiHow
Existing dataset built on WikiHow:
Summarisation: https://arxiv.org/pdf/1810.09305.pdf Commonsense: https://arxiv.org/abs/1905.07830 Subset of QA: https://huggingface.co/datasets?search=wikihow
Proposed Data
- full dataset of WikiHow
- add category of instruction/ fulfilment
- different response format: list, para, even table (this will involve developing some question template)
Possible Augmentation:
- Paraphrasing questions and answers
- Summarising answers from steps to short text (list to para conversion)
Some Open questions:
- How to create negative samples that make sense for evaluation of different dimensions (e.g. Not Helpful, Harmful, Not Honest). Random negative sample will be too off.
Looks great. I would add that it could be an "instruction/answer" as well as QA dataset. Sometimes it is not an question per se. Looking forward to your contributions!
I've only been able to find the full wikihowAll.csv in one location that seems to require manually downloading, I'm not sure if there's some reason for it not being available in full on huggingface (unless I'm mistaken) using it requires having it in local path. It's definitely possible to get though.
From exploring it there's definitely a possibility for changing response format, for example the "headline" column is basically a list of bold text from an article, which works really well as a bullet point summary.
I've only been able to find the full wikihowAll.csv in one location that seems to require manually downloading, I'm not sure if there's some reason for it not being available in full on huggingface (unless I'm mistaken) using it requires having it in local path. It's definitely possible to get though.
From exploring it there's definitely a possibility for changing response format, for example the "headline" column is basically a list of bold text from an article, which works really well as a bullet point summary.
@b-mc2 Yes, I also managed to get the full csv from WikiHow-Dataset. This task is then more about curating and expanding the dataset creatively. To elaborate, some of my initial idea in minds are:
- Develop question templates and expected answer.
- can you suggest [TASK] step by step? List[STEP_SUMMARY]
- can you suggest [TASK] step by step with details and present in a table? A table like List[[STEP_SUMMARY, STEP_DETAILS]]
- can you suggest [TASK] in details?
- can you suggest [TASK] briefly?
- can you elaborate more on [STEP_SUMMARY]? [STEP_DETAILS]
- few hops: can you suggest [TASK] step by step? List[STEP_SUMMARY] can you elaborate more on [STEP_SUMMARY]? [STEP_DETAILS]
- few hops: can you suggest [TASK] step by step? List[STEP_SUMMARY] can you suggest an alternative way? [STEP_SUMMARY2]
- Question grouping e.g. (How to Be a Visual Effects Artist1, How to Be a Visual Effects Artist2)
- Question clustering and meta-labeling (add topic)
- Paraphrasing
- Backtranslation
I know you are also working this and you may be ahead of me. Let's exchange idea and collaborate :smiley:
Excellent! These are awesome templates. Looking forward to your experiments and data.
Thats what my notebook could generate now. I am still finetuning the formating/cleaning, prompt type and yet to add more template.
Prompt types so far. Feel free to suggest.
- brief
- detailed
- step
brief
Prompt:
Can you suggest how to buy a tablecloth briefly?
Response:
Measure your table. Decide how you want the tablecloth to look when it is on the table. Use an online tablecloth calculator. Consider what elements of the tablecloth are important to you. Check online options. Consider older tablecloths when you want something fancy and elegant. Find the antique or vintage tablecloths. Know what to look for.
detailed
Prompt:
Can you suggest how to buy a tablecloth in details?
Response:
Note down the measurements by width, length and height. For example: A 6 foot banquet table measuring 30 inches wide by 72 inches long by 29 inches high.; Do you want the tablecloth to [TRUNCATED]
step
Prompt:
Can you suggest how to buy a tablecloth step by step?
Response:
- Measure your table.
- Decide how you want the tablecloth to look when it is on the table.
- Use an online tablecloth calculator.
- Consider what elements of the tablecloth are important to you.
- Check online options.
- Consider older tablecloths when you want something fancy and elegant.
- Find the antique or vintage tablecloths.
- Know what to look for.
Oh... This is very good!
Maybe add a follow up dialog such as "I don't like the last (first, etc.) step, can you remove it"
And/or can you summarize the steps for me. and run it through a basic summarizer such as t5.
Or "Can you suggest how to buy a tablecloth, step by step, with a [summary/breif description first] ?
Closing old data issue.