Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Constructing Wikihow for QA with Metadata and Different Response Format

Open kenhktsui opened this issue 2 years ago • 4 comments

This issue is to explore the possibility and necessity of building QA dataset based on WikiHow

Existing dataset built on WikiHow:

Summarisation: https://arxiv.org/pdf/1810.09305.pdf Commonsense: https://arxiv.org/abs/1905.07830 Subset of QA: https://huggingface.co/datasets?search=wikihow

Proposed Data

  • full dataset of WikiHow
  • add category of instruction/ fulfilment
  • different response format: list, para, even table (this will involve developing some question template)

Possible Augmentation:

  • Paraphrasing questions and answers
  • Summarising answers from steps to short text (list to para conversion)

Some Open questions:

  • How to create negative samples that make sense for evaluation of different dimensions (e.g. Not Helpful, Harmful, Not Honest). Random negative sample will be too off.

kenhktsui avatar Jan 05 '23 01:01 kenhktsui

Looks great. I would add that it could be an "instruction/answer" as well as QA dataset. Sometimes it is not an question per se. Looking forward to your contributions!

huu4ontocord avatar Jan 05 '23 01:01 huu4ontocord

I've only been able to find the full wikihowAll.csv in one location that seems to require manually downloading, I'm not sure if there's some reason for it not being available in full on huggingface (unless I'm mistaken) using it requires having it in local path. It's definitely possible to get though.

From exploring it there's definitely a possibility for changing response format, for example the "headline" column is basically a list of bold text from an article, which works really well as a bullet point summary.

b-mc2 avatar Jan 05 '23 02:01 b-mc2

I've only been able to find the full wikihowAll.csv in one location that seems to require manually downloading, I'm not sure if there's some reason for it not being available in full on huggingface (unless I'm mistaken) using it requires having it in local path. It's definitely possible to get though.

From exploring it there's definitely a possibility for changing response format, for example the "headline" column is basically a list of bold text from an article, which works really well as a bullet point summary.

@b-mc2 Yes, I also managed to get the full csv from WikiHow-Dataset. This task is then more about curating and expanding the dataset creatively. To elaborate, some of my initial idea in minds are:

  • Develop question templates and expected answer.
    • can you suggest [TASK] step by step? List[STEP_SUMMARY]
    • can you suggest [TASK] step by step with details and present in a table? A table like List[[STEP_SUMMARY, STEP_DETAILS]]
    • can you suggest [TASK] in details?
    • can you suggest [TASK] briefly?
    • can you elaborate more on [STEP_SUMMARY]? [STEP_DETAILS]
    • few hops: can you suggest [TASK] step by step? List[STEP_SUMMARY] can you elaborate more on [STEP_SUMMARY]? [STEP_DETAILS]
    • few hops: can you suggest [TASK] step by step? List[STEP_SUMMARY] can you suggest an alternative way? [STEP_SUMMARY2]
  • Question grouping e.g. (How to Be a Visual Effects Artist1, How to Be a Visual Effects Artist2)
  • Question clustering and meta-labeling (add topic)
  • Paraphrasing
  • Backtranslation

I know you are also working this and you may be ahead of me. Let's exchange idea and collaborate :smiley:

kenhktsui avatar Jan 05 '23 03:01 kenhktsui

Excellent! These are awesome templates. Looking forward to your experiments and data.

huu4ontocord avatar Jan 05 '23 05:01 huu4ontocord

Thats what my notebook could generate now. I am still finetuning the formating/cleaning, prompt type and yet to add more template.

Prompt types so far. Feel free to suggest.

  • brief
  • detailed
  • step

brief

Prompt:
Can you suggest how to buy a tablecloth briefly?
Response:
Measure your table. Decide how you want the tablecloth to look when it is on the table. Use an online tablecloth calculator. Consider what elements of the tablecloth are important to you. Check online options. Consider older tablecloths when you want something fancy and elegant. Find the antique or vintage tablecloths. Know what to look for.

detailed

Prompt:
Can you suggest how to buy a tablecloth in details?
Response:
Note down the measurements by width, length and height. For example: A 6 foot banquet table measuring 30 inches wide by 72 inches long by 29 inches high.; Do you want the tablecloth to [TRUNCATED]

step

Prompt:
Can you suggest how to buy a tablecloth step by step?
Response:
- Measure your table.
- Decide how you want the tablecloth to look when it is on the table.
- Use an online tablecloth calculator.
- Consider what elements of the tablecloth are important to you.
- Check online options.
- Consider older tablecloths when you want something fancy and elegant.
- Find the antique or vintage tablecloths.
- Know what to look for.

kenhktsui avatar Jan 11 '23 13:01 kenhktsui

Oh... This is very good!

Maybe add a follow up dialog such as "I don't like the last (first, etc.) step, can you remove it"

And/or can you summarize the steps for me. and run it through a basic summarizer such as t5.

Or "Can you suggest how to buy a tablecloth, step by step, with a [summary/breif description first] ?

huu4ontocord avatar Jan 11 '23 14:01 huu4ontocord

Closing old data issue.

andreaskoepf avatar Jun 14 '23 08:06 andreaskoepf