gpt4all Update python binding prompt formatting

Describe your changes

I moved the ### Prompt: message to all user prompts and made newline/space usage consistent.

Issue ticket number and link

#653

Checklist before requesting a review

[x] I have performed a self-review of my code.
[ ] If it is a core feature, I have added thorough tests.
[ ] I have added thorough documentation for my code.
[ ] I have tagged PR with relevant project labels. I acknowledge that a PR without labels may be dismissed.
[ ] If this PR addresses a bug, I have provided both a screenshot/video of the original bug and the working solution.

Demo

This is an example of the outputs I was getting from ggm-gpt4all-j-v1.3-groovy before the changes: (output from verbose)

### Instruction:
            The prompt below is a question to answer, a task to complete, or a conversation
            to respond to; decide which and write an appropriate response.

### Prompt:
what is the mass of aluminium? is it strong?
### Response:  The mass of aluminium is approximately 2.8 x 10^-3 kilograms, and it is considered strong.
it is used in
### Response:
  The mass of aluminium is approximately 2.8 x 10^-3 kilograms, and it is considered strong.It's used in a variety of applications, such as construction materials and alloys.

After the formatting changes I got these responses:

### Instruction:
            The prompt below is a question to answer, a task to complete, or a conversation
            to respond to; decide which and write an appropriate response.

### Prompt:
what is the mass of aluminium? is it strong?
### Response:
 The mass of aluminium is approximately 2.8 x 10^-3 kilograms, and it is considered a strong metal.
### Prompt:
it is used in
### Response:
 Yes, aluminum is used in many applications such as construction materials, cookware, and even in the production of aluminum cans.

By prepending Prompt: to user prompts, it seems to be must less likely to interpret the prompts as parts of its last response. It also repeats less of previous responses in some cases.

May 20 '23 23:05 Aetheris743

Hmm, in the end I think the binding needs to make an API for this. But I'll leave it up to the python folks...

May 21 '23 14:05 manyoso

Given how sensitive each model is to prompt templates, linking the backend together with per model prompt templates is likely something we want in the backend bindings. That being said, let's merge this so people have a good experience with the python bindings.

May 21 '23 16:05 AndriyMulyar

Why does it print the prompt and all the debug information?

Found model file.
gptj_model_load: loading model from '/Users/juno/.cache/gpt4all/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size  =  896.00 MB
gptj_model_load: ................................... done
gptj_model_load: model size =  3609.38 MB / num tensors = 285
### Instruction: 
            The prompt below is a question to answer, a task to complete, or a conversation 
            to respond to; decide which and write an appropriate response.

Can this be removed?

May 22 '23 01:05 lodenrogue

I am new to using gpt4all, but I believe that this behavior can be disabled by setting verbose to false when calling the function. chat_completion(..., verbose=False)

May 22 '23 01:05 Aetheris743

I am new to using gpt4all, but I believe that this behavior can be disabled by setting verbose to false when calling the function. chat_completion(..., verbose=False)

That's what I have in my code but I'm still getting those statistics:

gptj.chat_completion(messages, streaming=False, verbose=False)

May 22 '23 01:05 lodenrogue

Best I've found so far is doing python myscript.py 2> /dev/null

That removes the statistics but I'm still getting the message: Found model file.

May 22 '23 01:05 lodenrogue

The model loading statistics and Found model file/downloading messages all happen during model instantiation (gpt4all.GPT4All(model_name)). The verbose=False option helps control outputs during chat_completion and will not impact this.

I can add a warning or verbose option for model loading change it so "Found model file" only prints if verbose is True. The model loading statistics is coming from the C library so it's harder to control. This can be adjusted by changing Python sys.stdout which is how we're collecting the model response.

I would be a little hesitant to do this multiple times in the binding code though since it feels a little duplicative and sloppy to keep changing sys.stdout in a Python library. I think this might be better handled on the user side similar to @lodenrogue's approach if they don't want to see model loading information:

import sys
import os

# Suppress output
orig_stdout = sys.stdout 
dummy_file = open(os.devnull, 'w')
sys.stdout = dummy_file

# Load model
model = gpt4all.GPT4All(....)

# Revert to normal stdout
sys.stdout = orig_stdout

Thoughts??

May 22 '23 14:05 rguo123

I updated the tests. They now pass.

Jun 01 '23 17:06 Aetheris743

There are better versions of this in other PRs

Jul 12 '23 06:07 Aetheris743

Ah right I see, your PR got overlooked. They should've merged it a long time ago.

But things have moved on since then, anyway. There is a big one lined up now which has better prompt handling.

Jul 12 '23 06:07 cosmic-snow