gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

Issue: Prompting / Prompt Engineering Discussion

Open cosmic-snow opened this issue 1 year ago • 18 comments

Issue you'd like to raise.

[Note: this is intended to be a discussion rather than an issue with the codebase]

As you guys probably all know, it's sometimes hard to get consistent or even coherent responses. I'm especially interested in bringing out the "assistant personality".

I'm currently experimenting with gpt4all-l13b-snoozy and mpt-7b-instruct.

I've researched a bit on the topic, then I've tried with some variations of prompts (set them in: Settings > Prompt Template). But it seems to be quite sensitive to how the prompt is formulated. For example, I've tried the following plus a few variations, and it didn't really work all that well:

### System:
You are an artificial assistant that gives facts based answers.
You strive to answer concisely.
You review the answer after you respond to fact check it.
Importantly, think step by step while reviewing the answer.
Append to the message the correctness of the original answer from 0 to 9, where 0 is not correct at all and 9 is perfectly correct.
Enclose the review in double curly braces {{ }}.
### Human:
%1
### Assistant:

Most of the time this just ended with it repeating itself, if it made a review at all. Even telling it to not repeat the answer in the review didn't help.

However, while mpt-7b-instruct in general doesn't really want to follow these instructions, I've had some limited success with the following on snoozy:

### System:
You are an artificial assistant that gives facts based answers.
You strive to answer concisely.
When you're done responding, create and append a terse review to the answer.
In your review, you review the response to fact check it and point out any inaccuracies.
Be analytical and critical in your review, and very importantly, don't repeat parts of your answer.
### Human:
%1
### Assistant:

So this issue's goal is to open a discussion on how to better make it stay "on the rails". Are there specific things that can be added to make it more likely to "listen"? For example:

  • are there "trigger" words it strongly adheres to?
  • should any form of markup be used?
  • generation parameters that need to be tweaked?
  • general tips or advice?
  • can you share your own prompts that work well for a specific task and/or a specific model?

P.S. please always include the models you've used.


Example conversation with the better prompt template using snoozy. You probably have to regenerate the answer several times to get something similar:

Prompt:

what can you tell me about the python pandas and numpy libraries?

Response:

Pandas and NumPy are two popular libraries in Python for data manipulation and analysis. Pandas is a
library that provides data structures and functions for working with data in a tabular format, such
as dataframes and series. NumPy is a library that provides numerical operations, including
arithmetic, logical, and bitwise operations, as well as support for complex numbers and
arrays. Together, these libraries provide a powerful toolset for working with data in Python.

### Review:

* Pandas and NumPy are two popular libraries in Python for data manipulation and analysis.
* Pandas provides data structures and functions for working with data in a tabular format, such as dataframes and series.
* NumPy provides numerical operations, including arithmetic, logical, and bitwise operations, as well as support for complex numbers and arrays.
* Together, these libraries provide a powerful toolset for working with data in Python.

The information provided in the answer is accurate, but it is not clear what specific information
the user is looking for.

Suggestion:

No response

cosmic-snow avatar May 18 '23 19:05 cosmic-snow

After many more tries I've finally managed to convince the mpt-7b-instruct model to at least once output something similar with the following template (might require many response regenerations):

### Human:
You are an artificial assistant that gives facts based answers.
You strive to answer concisely.
When you're done responding, add a "Review" section and create and append a terse review to the response.
In your review, you review the response to fact check it and point out any inaccuracies.
Be analytical and critical in your review, and very importantly, don't repeat parts of your answer.

%1

### Assistant:

One key change here was removing the ### System: section and making the instructions part of the "human input".

In general, this model doesn't seem to be able to deal with this kind of meta-requests very well. Its sister model mpt-7b-chat might be more promising.

Side note: writing solely -- without prior prompts -- "tldr" or "translate to <language>" -- with <language> being one a specific model has potentially been trained on -- seems to be a good way to investigate whether it has picked up the additional instructions. The idea here is to shift the context and make it output the instructions that otherwise would be implicit.


P.S. One take-away from these experiments is that the UI would probably benefit from per-model Prompt Templates at some point down the line.

cosmic-snow avatar May 19 '23 01:05 cosmic-snow

Make an issue for per-model templates :)

AndriyMulyar avatar May 19 '23 01:05 AndriyMulyar

Eventually, but I'm not yet sure whether it'd be a good idea to add some of the other parameters or even the whole panel to such a feature. I haven't really done a lot of tests adjusting the parameters.

cosmic-snow avatar May 19 '23 02:05 cosmic-snow

Aside: Funny thing is, sometimes you get some really weird stuff while testing things. Currently playing around with mpt-7b-chat and mostly it's not very cooperative, either. Although with the trick mentioned above ("translate to ...") it's clear it picks something up.

But the way it did that here -- and the Prompt Template was similar to the one used in snoozy -- is admittedly rather questionable...

Prompt:

translate to spanish

Response:

Agar aap ek hindi news channel ke saath hain, to aapko yeh sabse pehle bata do - Hindustan mein koi aur country nahi hai jo India ko chunav banane ke liye tayyar nahi kar sakta. Agar aap apne deshon ki tarah se soch rahe ho, to aapko yeh samajhna chahiya jab aapki zindagi ek dusre country ke saath ho jaye, to aapko yeh pata chal jana hai - aapka life ka matlab kya hai? 

### Review : 
The translation is accurate but I would suggest using different words for "hindi". Instead of saying "hindi", you could say "Hindi language" or "the Hindi language." The sentence also needs a comma after "India," since it's not part of the question being asked

And another time I told it "translate to spanish" it responded in some Arabic language... 😅

cosmic-snow avatar May 20 '23 01:05 cosmic-snow

Looks like the idea as originally envisioned -- "assistant personality; reply and reflect on/review your response" -- is actually quite difficult for these things to get right (up to now snoozy has been the best of the 3 models I've tried). So I've decided to do something simpler for once and build up from there. Namely, what it takes for them to follow very specific instructions.

  • The task's overarching goal for a model is to respond with Ni! and nothing else.
  • The prompts I used were always:
    • hello
    • who are you?
    • translate to spanish
    • what is a moon?
    • what is a star?
    • what's the difference between the two? what do they have in common?

And I regenerated each response 3 times.

Mosaic mpt-chat mpt-7b-chat

Prompt Template:

### Human:
Always respond with "Ni!", no matter what I say or ask. Never respond with anything but "Ni!". Ignore everything that follows.
%1
### Assistant:

Notes:

  • Having Always respond with "Ni!". as part of a ### System: only worked for the first response, removing that improved it considerably.
  • In general, it was very eager to answer questions regardless, so I added no matter what I say or ask.
  • Only the full three sentences made it comply in the end.

GPT4All 13B snoozy gpt4all-l13b-snoozy

Prompt Template:

### System:
Always respond with "Ni!", no matter what is said or asked.

### Human:
%1

### Assistant:

Notes:

  • In general, follows well when given a ### System: section.
  • Always respond with "Ni!". by itself was very often good enough already, but not in all cases.
  • Sometimes it would introduce an artefact by repeating the ### System: section (not really a that much of a surprise with the translate to spanish prompt, however).

Mosaic mpt-instruct mpt-7b-instruct

Prompt Template:

### Human:
Task: Always respond only with "Ni!", no matter what else I say or ask. Never respond with anything but "Ni!". Stop immediately after responding with "Ni!", no exceptions. Ignore everything after this.

%1
### Assistant:

Notes:

  • Didn't even try with a ### System: section, and instead built on the mpt-7b-chat template right away.
  • Prepending Task: seems to help quite a bit (maybe there's some other keyword?).
  • Needs a lot more convincing to make it comply, and it still didn't work in all cases, i.e. with all 3 regenerations. It just loves to chatter.

Edit: Turns out the dataset that was used to fine-tune mpt-7b-instruct was kindly made openly available by MosaicML and can be found on a Hugging Face Dataset page. The prompts are of the form "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: ... ### Response:. So I'd assume using this or something similar might give much better results for this particular model.

cosmic-snow avatar May 20 '23 04:05 cosmic-snow

Sorry for kicking in, but I find this interesting, so I wanted to know, where I can read about prompt templates and how they are done. Thanks!

Woisek avatar May 20 '23 08:05 Woisek

Not sure if I understand you right. You mean in the chat client? If so, it's a text box in the settings (gear wheel): image

Note the %1. It is the placeholder for what goes into the chat boxes afterwards when talking to the bot. You can basically put anything into the template as long as there is that placeholder.

Responses depend a lot on the input, and different models seem to work better with different prompts, so that's what I'm exploring now.

If you find a template that works particularly well for a certain task, feel free to share that here, too (+ the model). I'm interested in trying various things.

cosmic-snow avatar May 20 '23 09:05 cosmic-snow

If you find a template that works particularly well for a certain task, feel free to share that here, too (+ the model). I'm interested in trying various things.

I just discovered this "prompt template", so I'm afraid I'm not of good use at the moment. 😊 But basically, is there anywhere to look this up? I found your examples interesting and I'm wondering how you cam up with this, that's why I asked if there is somewhere a explanation on how this has to be written. And I wonder if this can be used to create different templates to set the bot into a certain direction. Like for example, you wrote:

You are an artificial assistant that gives facts based answers.
You strive to answer concisely.

Now I wonder if something like this also works:

You are a perfect programmer of PHP code.
You strive for short and effective solutions.

Or

You are a biologist.
You strive to know and explain biological entities perfect.

Am I on the right track on how this (could) work or is that just wishful thinking? 🙄

Woisek avatar May 20 '23 11:05 Woisek

But basically, is there anywhere to look this up? I found your examples interesting and I'm wondering how you cam up with this, that's why I asked if there is somewhere a explanation on how this has to be written.

People have come up with specifically tailored prompts to use with ChatGPT, e.g. here is a site that collects them: https://flowgpt.com/

These models are not as powerful as ChatGPT, though, so I've been experimenting on my own with how to formulate them to get the best results. I think the original template I used was adapted from a reddit discussion (but on ChatGPT), not sure anymore, I'd have to go find that again. Doesn't really matter much, however, they need to be tailored.

Now I wonder if something like this also works:

You are a perfect programmer of PHP code.
You strive for short and effective solutions.

... Am I on the right track on how this (could) work or is that just wishful thinking? 🙄

Probably. I don't know PHP very well, so I'd use another language myself. But try it and see what happens. Results always depend on how much it was trained on a given language, though. Another thing I've found that is also "understood" quite well is something like: From now on, act like a ... and if you don't get good results, reinforce your instructions.

They tend to come up with false information quite a bit, though. So it's a good idea to question/task them in a subject area where you have some knowledge. Although one of my goals is to come up with prompts that make them "more reliable" in general.

Edit: Additional note: I've also started to look into the training data of individual models; I hope that'll help in coming up with better prompt templates. For your question in particular, it's probably not a good idea to start with PHP and the mpt-7b-chat model. See MosaicML's release announcement and scroll down to "The Stack". PHP isn't mentioned.

cosmic-snow avatar May 20 '23 18:05 cosmic-snow

I’m doing well for a basic character creation with groovy model with the following prompt template:

###Instruction: Pretend to be Bob the Robot. You package boxes for shipment. You love organization and hate mess. Your boss is Robbie the Robot. Robbie, your boss, is firm but not mean. The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response. ###Information: You don't know anything about life or people outside of the factory. ###Prompt: %1 ###Response:

teddybear082 avatar May 20 '23 19:05 teddybear082

I’m doing well for a basic character creation with groovy model with the following prompt template:

Nice. Gave it a quick try and at some point it came up with this: 😄

Prompt: and where is that factory located?

The factory is located in a big city called "Robbieville".

Just in case anyone's interested: I've also seen the keywords ### Instruction, ### Information, ### Response in the Alpaca Dataset (project page), so anything trained on that might work well with that prompt (mpt-7b-chat for example is). Additionally, mpt-7b-instruct has just ### Instruction and ### Response in its training Dataset, so that might work, too. Haven't looked into all the others yet.

Edit: Additionally, the following keywords are currently hardcoded into the different backends (haven't looked into where they're used, however):

cosmic-snow avatar May 20 '23 21:05 cosmic-snow

@cosmic-snow Thanks for your reply. I will explore this a bit and see where it gets me ... 🙂

Woisek avatar May 20 '23 23:05 Woisek

Sorry for kicking in, but I find this interesting, so I wanted to know, where I can read about prompt templates and how they are done. Thanks!

Here is a great repo for general possibilities that could be re-tooled for this UI. https://github.com/f/awesome-chatgpt-prompts

I'm going to do a bit of experimenting with the suggestions in this tread. This is a great discussion.

UserB-tm avatar May 21 '23 03:05 UserB-tm

I’m doing well for a basic character creation with groovy model with the following prompt template:

Nice. Gave it a quick try and at some point it came up with this: 😄

Prompt: and where is that factory located?

The factory is located in a big city called "Robbieville".

Just in case anyone's interested: I've also seen the keywords ### Instruction, ### Information, ### Response in the Alpaca Dataset (project page), so anything trained on that might work well with that prompt (mpt-7b-chat for example is). Additionally, mpt-7b-instruct has just ### Instruction and ### Response in its training Dataset, so that might work, too. Haven't looked into all the others yet.

Edit: Additionally, the following keywords are currently hardcoded into the different backends (haven't looked into where they're used, however):

Great find!! gpt.j: image

llama: image

mpt: image

UserB-tm avatar May 21 '23 03:05 UserB-tm

[comment major revision: v2]

In this comment I'm going to write a few things about Datasets & Models and will update it occasionally once I know more.

The basics: there are so-called Foundation Models which are trained on big Datasets from various sources. These are then typically fine-tuned with much smaller Datasets for specific purposes. Some models use only parts of datasets as their input. So far I've seen keywords like ### Instruction and ### Response only in Datasets used for fine-tuning (although that doesn't mean much yet, still have a lot to look at). Models come in different sizes -- often 6/7B, 13B, 33B, 65B, or even bigger, where B stands for billion parameters. Bigger is slower and can mean it's better.

Foundation Models

  • GPT-J by EleutherAI, a 6B model trained on the dataset:
    • The Pile
  • LLaMA by Meta AI, a number of differently sized models (7B, 13B, 33B, 65B) trained on several (partially curated) datasets:
    • CommonCrawl
    • C4
    • GitHub
    • Wikipedia
    • Books (in part from 'The Pile')
    • ArXiv
    • Stack Exchange
  • MPT-7B by MosaicML, a 7B model (available as mpt-7b-base, aka. Mosaic mpt-7b or simply MPT), trained on several datasets:
    • C4
    • mC4
    • RedPajama (which is based on others and tries to emulate the LLaMA datasets)

Fine-Tuned Models

  • GPT4All-J by Nomic AI, fine-tuned from GPT-J, by now available in several versions: gpt4all-j, gpt4all-j-v1.1-breezy, gpt4all-j-v1.2-jazzy, gpt4all-j-v1.3-groovy, using the dataset:
    • GPT4All-J Prompt Generations
  • GPT4All 13B snoozy by Nomic AI, fine-tuned from LLaMA 13B, available as gpt4all-l13b-snoozy using the dataset:
    • GPT4All-J Prompt Generations
  • Mosaic MPT-7B-Chat is based on MPT-7B and available as mpt-7b-chat
  • Mosaic MPT-7B-Instruct is based on MPT-7B and available as mpt-7b-instruct
  • Alpaca models are based on LLaMA 7B and use the Stanford Alpaca dataset
  • Vicuña variants [? not yet investigated]
  • Dolly [? listed on the main page comparison but no model]
  • Pythia [? listed on the main page comparison but no model]
  • FastChat [? listed on the main page comparison but no model]
  • StableLM [? listed on the main page comparison but no model]
  • Wizard variants [? not yet investigated]

Datasets (WIP)

Name Creator Links Size Open Purpose Keywords
The Pile EleutherAI homepage ¦ paper 825GiB yes
LLaMA Meta AI paper ¦ repo 4.75TB no
RedPajama Together Hugging Face ¦ repo yes
CommonCrawl Common Crawl
C4 Google Hugging Face ¦ paper ¦ repo 38GB - 27TB yes
GPT4All-J Prompt Generations Nomic AI homepage ¦ Hugging Face ¦ report 1 ¦ report 2 ¦ report 3 ¦ repo yes
Alpaca Stanford CRFM Hugging Face ¦ repo ¦ announcement 52k yes
HH-RLHF Anthropic Hugging Face ¦ paper 169k yes helpful & harmless
Dolly-15k Databricks Hugging Face 15k yes
dolly_hhrlhf MosaicML Hugging Face 59k yes helpful & harmless ### Instruction ### Response

to add later: mC4, ShareGPT-Vicuna, HC3, Evol-Instruct, [GitHub], [Wikipedia], [Books], [ArXiV], [Stack Exchange]

Additional Notes

  • LLaMA's exact training data is not public. However, the paper has information on sources and composition
  • C4:
    • based on Common Crawl
    • was created by Google but is documented by the Allen Institute for AI (aka. AI2)
    • comes in 5 variants; the full set is multilingual, but typically the 800GB English variant is meant
    • C4 stands for Colossal Clean Crawled Corpus
  • GPT4All Prompt Generations has several revisions. The latest one (v1.3) is the basis for gpt4all-j-v1.3-groovy and gpt4all-l13b-snoozy
  • HH-RLHF stands for Helpful and Harmless with Reinforcement Learning from Human Feedback


Why does all of this matter? Looks like especially the fine-tuning datasets can have a considerable impact on which Prompt Templates work and which don't. So I wanted to have a bit of an overview.

Last but not least, a note: The models are also typically "downgraded" in a process called quantisation to make it even possible for them to work on consumer-grade hardware. This can negatively impact their performance (in terms of capability, not speed). I'm not yet sure where to find more information on how this was done in any of the models.

cosmic-snow avatar May 21 '23 22:05 cosmic-snow

I've been pointed to a very interesting paper which brushes on the topic: "Large Language Models are Zero-Shot Reasoners" (pdf), which cites "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (pdf).

They say:

[...] While these successes are often attributed to LLMs’ ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding “Let’s think step by step” before each answer.

based on the referred-to paper:

We explore how generating a chain of thought—a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning.

If you're not interested in reading those, one key take-away is, as mentioned: add something like Let's think step by step to your prompt.

I'm currently experimenting with deducing something general from a very narrow, specific fact. For a template I've come up with for gpt4all-l13b-snoozy (in the paper they say prepend the phrase to the answer, but here I'm using it in an example, which is not "zero-shot", but "one-shot"/"few-shot" reasoning):

### System:
Here's an example:

Is it dry during summertime in Cairo?

Let's think step by step.
That sounds like a task which requires encyclopaedic knowledge.
Cairo is a town in Egypt.
Egypt has a hot desert climate and rainfall in summer drops to almost 0.
It's safe to say that it's extremely dry in Cairo.

###Human:
%1

### Assistant:

And I feel like this helps. But I'll have to do more tests and try different models to get a better picture. To test it I used e.g. the prompt (with a location I randomly looked for on Wikipedia): "What are the conditions like around lake Chala?". Some problems/notes:

  • sometimes repeats the input/example
  • when it doesn't get the initial fact right (what/where lake Chala actually is), the response is of course not very useful
  • might even be case sensitive, i.e. 'Lake Chala' in the question might be better than 'lake Chala'

cosmic-snow avatar May 24 '23 20:05 cosmic-snow

Would love any advice on prompt engineering for mpt-7b-instruct where I provide a context from a local embeddings store. I'm really used to openAI's simplistic syntax of:

You will be provided a context, answer the following question based on the context

Context: Question:

I'm finding it very hit-and-miss with mpt-7b-instruct, but more reliable still than mpt-7b-chat. Also it seems the mpt-7b-instruct response is more willing to derive an answer from outside the locally provided conext... I think language like "don't rely on any sources outside of the context" can help. Anyhow, does anyone have good experience with this approach on mpt-7b-instruct?? I'm finding very few examples online that involve a local embeddings store as part of the prompt. I don't even really know how necessary #### and newline separators are for the LLM to interpret the context vs question.

etanlightstone-domino avatar May 29 '23 19:05 etanlightstone-domino

Would love any advice on prompt engineering for mpt-7b-instruct where I provide a context from a local embeddings store.

I haven't played around with a local embeddings store, so I'm not sure I can help you there. Also, recently I've been busy with a few other things and haven't had much time to invest into this. However, I can tell you a few things you can try that helped me so far:

  • Add "Let's think step by step" either before your example, or right at the end of the prompt, so it's before the model's response.
  • Start it off with a suitable "You are a..." / "Pretend to be a..." / "Act like a..." and a role of authority, e.g. a "senior <profession>" / "Professor in <field>" / "expert in <profession/field>".
  • Maybe add a general, easy, correct example at the beginning. In the form you'd like it to respond.
  • Not sure, but for more complex tasks, it might be better to try bigger models. Although there seems to be a lot of variability among the provided ones.
  • Regenerate the response a few times if it doesn't work right away. If it's "stuck", turn up the Temperature.
  • Remember: it's ultimately trying to complete a text, not chat with you. It just so happens that this text is in the form of a dialog.

Also, in what I've tried so far, it does depend on the model you pick. A good prompt in one does not necessarily mean it works well in another. It's a big part of why I started looking into the training sets (see comment above). For example, mpt-7b-instruct uses the following: dolly_hhrlhf In that they use the keywords: ### Instruction and ### Response.

I don't even really know how necessary #### and newline separators are for the LLM to interpret the context vs question.

I don't think newlines are very important, but ### might be. As mentioned, see the mpt-7b-instruct dataset.

Last but not least, also have a look at this issue: #736 Prompts that you think do work.

cosmic-snow avatar May 29 '23 21:05 cosmic-snow

I'll leave this open for future reference. This is interesting.

niansa avatar Aug 11 '23 13:08 niansa

"###Instruction: ###Information: ###Prompt: %1 ###Response:"

this type of prompt template is worth gold! i´m messing up with gpt4all for 5 days, from getting no answer to getting answers i dont wanna get, and back to no answers. holy shit my odyssey ends here x)

Macalanias avatar Nov 23 '23 22:11 Macalanias