How to train a instruction code generated model based on starcoder and ta-prompt?

Open jsuper opened this issue 2 years ago • 1 comments

How can I do to train a instruction code generated model based on starcoder and ta-prompt?

The official document mentioned that we can use ta-prompt to turn it into a technical assistant, but there is no document to guide user how to do.

The model was trained on GitHub code. As such it is not an instruction model and commands like "Write a function that computes the square root." do not work well. However, by using the Tech Assistant prompt you can turn it into a capable technical assistant.

Jul 17 '23 07:07 jsuper

Hi, I'll give you a high level view of how you can proceed. The point of using ta-prompt is actually to avoid doing instruction fine-tuning. You can use the ta-prompt to turn your model into a technical assistant. The idea is to rely on in-context learning. You will use a set of conversations with a format that you want your assistant to mimic. You can have something like this

prompt = 
"""
Human: <question> (e.g. Write a function square_sum which takes a list arr as input and return the square of the sum of its element.)
Assistant: <answer> (e.g. def square_sum(arr):\n\treturn sum(arr)**2.)

Human:<question>
Assistant:<answer>
 
"""

# If you have your conversations stored in the file prompt_file.txt you can do this

with open("./prompt_file.txt", "r") as f:
    prompt = f.read() + "\n\n"

Now, if you want you model to answer a specific request, e.g. instruction = "Write a function to compute the gcd between 2 positive integers an and b." you will have to prepend your prompt to it. You will feed your model with

prompt + "Human: "+instruction+"\nAssistant: "

And the model should complete it with the desire implementation. You can use the bigcode-playground in other to realize that it works and also perform further tests.

This input

Human: Write a function square_sum which takes a list arr as input and return the square of the sum of its element.
Assistant: 
def square_sum(arr):
   return sum(arr)**2.

Human: Write a function to compute the gcd between 2 positive integers an and b.
Assistant:

Should give this output

Human: Write a function square_sum which takes a list arr as input and return the square of the sum of its element.
Assistant: 
def square_sum(arr):
   return sum(arr)**2.

Human: Write a function to compute the gcd between 2 positive integers an and b.
Assistant: def gcd(a,b)
    if (b==0):
        return a; 
    else : 
        return gcd(b, a%b);

Now, this in-context learning method has its cons. If you have an instruction dataset, you can use the code provided in this directory in order to perform instruction fine-tuning.

Jul 17 '23 10:07 ArmelRandy