llama
llama copied to clipboard
The response from meta-llama/Llama-2-7b-chat-hf ends with incomplete sentence when I am trying to get inference.
I loaded meta-llama/Llama-2-7b-chat-hf into GPU, and tried to get response to a question. Here is the key part of the code:
def load_model(model_name, bnb_config):
n_gpus = torch.cuda.device_count()
max_memory = f'{40960}MB'
# Load a model, passing the token as an argument
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
use_auth_token=huggingface_token,
device_map="auto", # dispatch efficiently the model on the available ressources
max_memory = {i: max_memory for i in range(n_gpus)},
)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True, add_eos_token=True, use_fast=False, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = 18610
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training
return model, tokenizer
model_name = "meta-llama/Llama-2-7b-chat-hf"
bnb_config = create_bnb_config()
model, tokenizer = load_model(model_name, bnb_config)
B_S = "<s>"
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
SPECIAL_TAGS = [B_INST, E_INST, "<<SYS>>", "<</SYS>>"]
system_prompt = "You are an helpful AI assistant, please answer this question:"
user_message = "How to achieve high grade in math for a first year student in high school?"
prompt = f"{B_S}{B_INST}{B_SYS}{(system_prompt).strip()} {E_SYS} {(user_message).strip()}{E_INST}\n\n",
input_ids = tokenizer(prompt, return_tensors="pt",return_attention_mask=False).input_ids.to('cuda')
outputs = model.generate(input_ids, max_length=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Before fine-tuning the response is: ",response)
The output as below:
[INST]<<SYS>> You are an helpful AI assistant, please answer this question: <</SYS>>
How to achieve high grade in math for a first year student in high school?[/INST]
-
Practice consistently: Regular and consistent practice is essential to improve in math. Set aside a specific time each day to practice solving math problems, even if it's just for 15-20 minutes. You can use worksheets, online resources, or practice tests to help you.
-
Understand the basics: Make sure you have a solid understanding of basic math concepts such as fractions, decimals, percentages, algebra, and geometry. Review these basics regularly, and practice working with simple problems to build your confidence.
-
Break down problems: When solving math problems, break them down into smaller, manageable steps. This will help you understand the problem better and make it easier to solve.
-
Seek help when needed: Don't be afraid to ask for help when you're struggling with a math concept or problem. You can ask your teacher, tutor, or classmate for assistance.
-
Watch video tutorials: Watching video tutorials can help you visualize math concepts and problems better. You can find plenty of math video tutorials on websites such as Khan Academy, Mathway, or MIT OpenCourseWare.
-
Take your time: Don't rush through math problems. Take your time to read the problem carefully, understand it, and work through it step by step.
-
Use visual aids: Visual aids such as graphs, charts, and diagrams can help you understand complex math concepts better. Use them to visualize the problem and find a solution.
-
Practice with real-world examples: Try to relate math concepts to real-world examples. This will help you understand how math is used in everyday life and make it more interesting.
-
Stay organized: Keep all your math materials organized, including worksheets, notes, and textbooks. This will help you find what you need quickly and avoid wasting time searching for materials.
-
Review regularly: Review math concepts regularly, even after you think you understand them. This will help you retain the information and avoid Why the response ends here not a complete sentence? How to solve this? Thank you!
Can anybody help with this question? Thank you so much and appreciate it!!
outputs = model.generate(input_ids, max_length=512) 这里设置了最大的生成长度时512,到达这个长度会自动停止生成