Open-Assistant Creating augmented data using few-shot prompts for explanations of jokes, logical inferences, etc.

See https://www.lesswrong.com/posts/EHbJ69JDs4suovpLw/testing-palm-prompts-on-gpt3.

Try doing 2, 3 or 4 shot inference on something like JT or neox 20B or galactica.

After we find a promising model and configuration, we can scrape the net for jokes and paragraphs with logical inferences to create dialog data.

Human: Tell me a joke about {extract keywords from joke} Assistant: {joke} Human: Explain the joke. Assisant: {explanation}

See also https://storage.googleapis.com/pathways-language-model/PaLM-paper.pdf

Jan 02 '23 07:01 huu4ontocord

Adding expalantions at the end of existing instruction dataset answers where the answers are classificaitons (see p3, natural instructions, etc):

For exmple,

This is a movie review for the movie {movie}: {review}. This movie review is {classifciaiton} because ...[your created answer]

This is a movie review for the movie {movie}: {review}. This movie review is {classifciaiton} because ...generated answer

We can also follow this up with explanations for other "hard" things like:

explain riddles, poems (metaphors), analogies, songs

Jan 05 '23 00:01 huu4ontocord

Going with the movie reviews idea, could we use the Rotten Tomatoes dataset to generate prompts, maybe supplement with one of the models fine tuned on it as well?

https://huggingface.co/datasets/rotten_tomatoes

Jan 05 '23 03:01 smytjf11

The idea is to create a dataset with explanations. Like for example take the movie dataset and do this: This is a movie review for the movie {movie}: {review}. This movie review is {classifciaiton} because ...[your created answer] Am I right? I'm interested in picking this up. How large should the dataset be?

Jan 17 '23 18:01 momegas

@momegas yes. if it is very compute intensive, it doesn't need to be large. maybe see if you can get it to work first. And then we can discuss size. we can run it on some extra compute.

Jan 22 '23 04:01 huu4ontocord

Sound like a very cool task and I would love to give it a try if it is still relevant :) @ontocord

Mar 20 '23 09:03 mikegarts

@ontocord I'd like to have a try, can you tell me your name in Discord? Maybe we can talk a little bit more there. @mikegarts Maybe we can work on it together? More data is better for this project. My name in Discord is QiKo

Mar 23 '23 15:03 kkie02

@kkie02 Sure, I'm in discord as mikegarts. Feel free to ping me. Btw I just opened a pr with somewhat relevant instruction dataset https://github.com/LAION-AI/Open-Assistant/pull/2209 but would love to cooperate on further work.

Mar 25 '23 14:03 mikegarts

Going to work in this field, but with more specific tasks (semantics, logic, reasoning) https://github.com/LAION-AI/Open-Assistant/issues/3122

May 10 '23 20:05 echo0x22

Closing old data issue.

Jun 14 '23 08:06 andreaskoepf

Open-Assistant Open-Assistant copied to clipboard

Creating augmented data using few-shot prompts for explanations of jokes, logical inferences, etc.

Open-Assistant
Open-Assistant copied to clipboard