starcoder
starcoder copied to clipboard
Better inference based on starcode2-3b model
I am new to starcode.
when I run the follow demo:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
checkpoint = "./starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)
inputs = tokenizer.encode("def is_prime(n):", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
it returns:
def is_prime():
"""
This function checks if a number is prime or not.
"""
it doesn`t finish. so I SET the max_length=120, then it returns:
def is_prime():
"""
This function checks if a number is prime or not.
"""
num = int(input("Enter a number: "))
if num > 1:
for i in range(2, num):
if (num % i) == 0:
print(num, "is not a prime number")
break
else:
print(num, "is a prime number")
else:
print(num, "is not a prime number")
is_prime()
<file_sep>/README.md
# Python-
The part
is_prime()
<file_sep>/README.md
# Python-
is redundant. now my solution is:
generated_code = tokenizer.decode(outputs[0])
if "<file_sep>" in generated_code:
generated_code = generated_code.split("<file_sep>")[0]
print(generated_code)
But I don`t think it a good idea. I want the model to return the results in one go without generating redundant parts. How can I do that? Could you give me some advice?
Or, I noticed that in https://huggingface.co/bigcode/starcoder2-3b The inference API can generate code piece by piece, each time I press the Compute. How can I implement such functionality? (For example, in python, every time I send a request, the model returns me a portion of the results. The next time I send a request, it will send the request based on the previous request + previous results it returns. In this way, the code can be completed step by step without creating redundant parts.) Many thanks for your advice!