CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Problems with no_repeat_ngram_size

Open kliffeup opened this issue 9 months ago • 1 comments

Hi! First and foremost, thanks for the great package!

I converted the model ehartford/WizardLM-Uncensored-Falcon-7b from HuggingFace with the following command:

ct2-transformers-converter --model ehartford/WizardLM-Uncensored-Falcon-7b --output_dir ./falcon_7b_fp32 --trust_remote_code

After that, I encountered repetitions from the converted model. I tried the prompt, where I consistently asked the model several questions.

  1. I used:
  • BestSampler (sampling_topk = 1)
  • GreedySearch (beam_size = 1)
  • no_repeat_ngram_size = 10

And the converted model slides into repetitions.

Code with converted model generation...

import ctranslate2
import transformers

generator = ctranslate2.Generator("./falcon_7b_fp32", device="cuda")
tokenizer = transformers.AutoTokenizer.from_pretrained("ehartford/WizardLM-Uncensored-Falcon-7b", trust_remote_code=True)

prompt = "The following is a common sense conversation of Jill with a John. Goal of Jill is to answer Johns questions honestly.\nHere goes some facts about Jill:\nJill have a corgi dog, and her name is Cinnamon.\nJill isn't a cat person\nJill is a dog person.\n"

questions = (
    "John: Answer in a verbose and detailed manner: What's your opinion on dogs versus cats?\nJill:",
    "John: Give me a meticulous explanation: What's your opinion on dogs versus cats?\nJill:",
    "John: Reply with a witty remark: What's your opinion on dogs versus cats?\nJill:",
    "John: Answer briefly: What's your opinion on dogs versus cats?\nJill:",
    "John: Please answer in a verbose and detailed manner further. Dogs or cats?\nJill:",
)

answer = ""
for question in questions:
    prompt += answer + question
    tokens = tokenizer.tokenize(prompt)

    results = generator.generate_batch(
        [tokens],
        sampling_topk=1,
        no_repeat_ngram_size=10,
        max_length=384,
        end_token=["Ċ", "<|endoftext|>"],
        include_prompt_in_result=False,
    )
    answer = tokenizer.decode(results[0].sequences_ids[0])
    if answer[-1] != "\n":
        answer += "\n"

print(prompt + answer)

...outputs the following output:

The following is a common sense conversation of Jill with a John. Goal of Jill is to answer Johns questions honestly.
Here goes some facts about Jill:
Jill have a corgi dog, and her name is Cinnamon.
Jill isn't a cat person
Jill is a dog person.
John: Answer in a verbose and detailed manner: What's your opinion on dogs versus cats?
Jill: I prefer dogs over cats. I think dogs are more loyal and affectionate. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be aloof and independent. They are more solitary creatures and prefer to be left alone. I also think dogs are better for families with children because they are more patient and gentle with them. In contrast, cats can be unpredictable and may scratch or bite if they feel threatened. Overall, I think dogs make better pets than cats.
John: Give me a meticulous explanation: What's your opinion on dogs versus cats?
Jill: I prefer dogs over cats. I think dogs are more loyal and affectionate. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be aloof and independent. They are more solitary creatures and prefer to be left alone. I also think dogs are better for families with children because they are more patient and gentle with them. In contrast, cats can be unpredictable and may scratch or bite if they feel threatened. Overall, I think dogs make better pets than cats.
John: Reply with a witty remark: What's your opinion on dogs versus cats?
Jill: I prefer dogs over cats. I think dogs are more loyal and affectionate. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be aloof and independent. They are more solitary creatures and prefer to be left alone. I also think dogs are better for families with children because they are more patient and gentle with them. In contrast, cats can be unpredictable and may scratch or bite if they feel threatened. Overall, I think dogs make better pets than cats.
John: Answer briefly: What's your opinion on dogs versus cats?
Jill: I prefer dogs over cats. Dogs are more loyal and affectionate. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be aloof and independent. They are more solitary creatures and prefer to be left alone. I also think dogs are better for families with children because they are more patient and gentle with them. In contrast, cats can be unpredictable and may scratch or bite if they feel threatened. Overall, I think dogs make better pets than cats.
John: Please answer in a verbose and detailed manner further. Dogs or cats?
Jill: I prefer dogs over cats. Dogs are more loyal and affectionate. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be aloof and independent. They are more solitary creatures and prefer to be left alone. I also think dogs are better for families with children because they are more patient and gentle with them. In contrast, cats can be unpredictable and may scratch or bite if they feel threatened. Overall, I think dogs make better pets than cats.

However, a model from HF with a similar set of parameters produces a non-repeating output.

Code with original model generation:

from transformers import AutoTokenizer
import transformers
import torch

model = "ehartford/WizardLM-Uncensored-Falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float32,
    device_map="auto",
    trust_remote_code=True,
)

prompt = "The following is a common sense conversation of Jill with a John. Goal of Jill is to answer Johns questions honestly.\nHere goes some facts about Jill:\nJill have a corgi dog, and her name is Cinnamon.\nJill isn't a cat person\nJill is a dog person.\n"

questions = (
    "John: Answer in a verbose and detailed manner: What's your opinion on dogs versus cats?\nJill:",
    "John: Give me a meticulous explanation: What's your opinion on dogs versus cats?\nJill:",
    "John: Reply with a witty remark: What's your opinion on dogs versus cats?\nJill:",
    "John: Answer briefly: What's your opinion on dogs versus cats?\nJill:",
    "John: Please answer in a verbose and detailed manner further. Dogs or cats?\nJill:",
)

answer = ""

for question in questions:
    prompt += answer + question

    sequences = pipeline(
        prompt,
        max_new_tokens=384,
        do_sample=False,
        num_return_sequences=1,
        eos_token_id=[tokenizer.eos_token_id, tokenizer.encode("\n")[0]],
        return_full_text=False,
        no_repeat_ngram_size=10,
    )

    answer = sequences[0]["generated_text"]
    if answer[-1] != "\n":
        answer += "\n"
    
print(prompt + answer)

Output:

The following is a common sense conversation of Jill with a John. Goal of Jill is to answer Johns questions honestly.
Here goes some facts about Jill:
Jill have a corgi dog, and her name is Cinnamon.
Jill isn't a cat person
Jill is a dog person.
John: Answer in a verbose and detailed manner: What's your opinion on dogs versus cats?
Jill: I prefer dogs over cats. I think dogs are more loyal and affectionate. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be aloof and independent. They are more solitary creatures and prefer to be left alone. I also think dogs are better for families with children because they are more patient and gentle with them. In contrast, cats can be unpredictable and may scratch or bite if they feel threatened. Overall, I think dogs make better pets than cats.
John: Give me a meticulous explanation: What's your opinion on dogs versus cats?
Jill: My opinion is that dogs are generally better pets than cats. Dogs are more loyal and affectionate, and they are always willing to play with you. They are also more patient and gentle with children. Cats, on the other hand, can sometimes be aloof and independent, and they may scratch or bite if they feel threatened. Dogs are also generally healthier than cats, as they require more exercise and attention. In summary, I think dogs make better pets than cats because they are more loyal, affectionate, and better suited for families with children.
John: Reply with a witty remark: What's your opinion on dogs versus cats?
Jill: Dogs are like iPhones, they come in different colors and sizes, but they all have the same operating system. Cats, on the other hand, are like Android phones, they come in different shapes and sizes, but they all have their own unique features and operating systems. In summary, I prefer dogs because they are more loyal and affectionate, and they come in different colors and sizes, but with the same operating system. Cats, on the contrary, are more independent and come in different shapes and sizes, but they have their own unique features and operating systems.
John: Answer briefly: What's your opinion on dogs versus cats?
Jill: In my opinion, dogs are better pets than cats. Dogs are more loyal, affectionate, and better suited to families with children. They also come in different colors and sizes, but with a similar operating system. Cats, on the other hand are more independent and come in different shapes and styles, but they have their own unique features and systems.
John: Please answer in a verbose and detailed manner further. Dogs or cats?
Jill: I prefer dogs over cat. Dogs are more loyal and affectionate, and I think they make better pets for families with children. They also come in different shapes and sizes, but with a similar operating system, which makes them easier to train and socialize. Cats, on the other hand, are more independent and come in different shapes, sizes, and styles, but they have their own unique characteristics and systems. In summary, I think dogs are better pets than cats because they are more loyal and affectionate, and easier to train and socialize.

Is this behavior expected for a combination of BestSampler and GreedySearch despite the no_repeat_ngram_size parameter?

  1. Even when using RandomSampler (sampling_topk=50 and other parameters are the same), the converted model still has repeating fragments of large size.

For a given set of parameters:

results = generator.generate_batch(
    [tokens],
    sampling_topk=50,
    no_repeat_ngram_size=10,
    max_length=384,
    end_token=["Ċ", "<|endoftext|>"],
    include_prompt_in_result=False,
)

I got the following output:

The following is a common sense conversation of Jill with a John. Goal of Jill is to answer Johns questions honestly.
Here goes some facts about Jill:
Jill have a corgi dog, and her name is Cinnamon.
Jill isn't a cat person
Jill is a dog person.
John: Answer in a verbose and detailed manner: What's your opinion on dogs versus cats?
Jill: I am definitely a dog person. I grew up with dogs, and I have several of them right now. I think dogs are more loyal and affectionate than cats. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be more independent and less needy. They are also more difficult to train and socialize with. In my opinion, dogs make better pets than cats.
John: Give me a meticulous explanation: What's your opinion on dogs versus cats?
Jill: Well, I am a dog person, so I am naturally going to be biased towards dogs. I think dogs are more loyal and affectionate than cats. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be more independent and less needy. They are also more difficult to train and socialize with. Dogs make better pets than cats because they are more social animals and require more physical activity from their owners. They are also more intelligent and trainable than cats. In summary, dogs make better pets than cats due to their loyalty, affection, and social nature.
John: Reply with a witty remark: What's your opinion on dogs versus cats?
Jill: I am sorry, but I am a bit biased towards dogs. I think cats are cute, but they are not as loyal or affectionate as dogs.
John: Answer briefly: What's your opinion on dogs versus cats?
Jill: As I mentioned, I am a dog person, so I prefer dogs over cats.
John: Please answer in a verbose and detailed manner further. Dogs or cats?
Jill: I am definitely a dog person. I grew up with dogs and have several of them right now. I think dogs are more loyal and affectionate than cats. They are always happy to see you and are willing to play with you. Cats are more independent and less needy, but they can also be more difficult to train and socialize with. Dogs require more physical activity from their owners, and they are more intelligent and trainable than cats. In summary, dogs make better pets than cats due to their loyalty, affection, social nature, and need for physical activity.

For example, the first response of the model is tokenized as

['ĠI', 'Ġam', 'Ġdefinitely', 'Ġa', 'Ġdog', 'Ġperson', '.', 'ĠI', 'Ġgrew', 'Ġup', 'Ġwith', 'Ġdogs', ',', 'Ġand', 'ĠI', 'Ġhave', 'Ġseveral', 'Ġof', 'Ġthem', 'Ġright', 'Ġnow', '.', 'ĠI', 'Ġthink', 'Ġdogs', 'Ġare', 'Ġmore', 'Ġloyal', 'Ġand', 'Ġaffectionate', 'Ġthan', 'Ġcats', '.', 'ĠThey', 'Ġare', 'Ġalways', 'Ġhappy', 'Ġto', 'Ġsee', 'Ġyou', 'Ġand', 'Ġare', 'Ġwilling', 'Ġto', 'Ġplay', 'Ġwith', 'Ġyou', '.', 'ĠCats', ',', 'Ġon', 'Ġthe', 'Ġother', 'Ġhand', ',', 'Ġcan', 'Ġbe', 'Ġmore', 'Ġindependent', 'Ġand', 'Ġless', 'Ġneedy', '.', 'ĠThey', 'Ġare', 'Ġalso', 'Ġmore', 'Ġdifficult', 'Ġto', 'Ġtrain', 'Ġand', 'Ġsocialize', 'Ġwith', '.', 'ĠIn', 'Ġmy', 'Ġopinion', ',', 'Ġdogs', 'Ġmake', 'Ġbetter', 'Ġpets', 'Ġthan', 'Ġcats', '.']

and the second as

['ĠWell', ',', 'ĠI', 'Ġam', 'Ġa', 'Ġdog', 'Ġperson', ',', 'Ġso', 'ĠI', 'Ġam', 'Ġnaturally', 'Ġgoing', 'Ġto', 'Ġbe', 'Ġbiased', 'Ġtowards', 'Ġdogs', '.', 'ĠI', 'Ġthink', 'Ġdogs', 'Ġare', 'Ġmore', 'Ġloyal', 'Ġand', 'Ġaffectionate', 'Ġthan', 'Ġcats', '.', 'ĠThey', 'Ġare', 'Ġalways', 'Ġhappy', 'Ġto', 'Ġsee', 'Ġyou', 'Ġand', 'Ġare', 'Ġwilling', 'Ġto', 'Ġplay', 'Ġwith', 'Ġyou', '.', 'ĠCats', ',', 'Ġon', 'Ġthe', 'Ġother', 'Ġhand', ',', 'Ġcan', 'Ġbe', 'Ġmore', 'Ġindependent', 'Ġand', 'Ġless', 'Ġneedy', '.', 'ĠThey', 'Ġare', 'Ġalso', 'Ġmore', 'Ġdifficult', 'Ġto', 'Ġtrain', 'Ġand', 'Ġsocialize', 'Ġwith', '.', 'ĠDogs', 'Ġmake', 'Ġbetter', 'Ġpets', 'Ġthan', 'Ġcats', 'Ġbecause', 'Ġthey', 'Ġare', 'Ġmore', 'Ġsocial', 'Ġanimals', 'Ġand', 'Ġrequire', 'Ġmore', 'Ġphysical', 'Ġactivity', 'Ġfrom', 'Ġtheir', 'Ġowners', '.', 'ĠThey', 'Ġare', 'Ġalso', 'Ġmore', 'Ġintelligent', 'Ġand', 'Ġtrain', 'able', 'Ġthan', 'Ġcats', '.', 'ĠIn', 'Ġsummary', ',', 'Ġdogs', 'Ġmake', 'Ġbetter', 'Ġpets', 'Ġthan', 'Ġcats', 'Ġdue', 'Ġto', 'Ġtheir', 'Ġloyalty', ',', 'Ġaffection', ',', 'Ġand', 'Ġsocial', 'Ġnature', '.']

It can be seen that the outputs of the model intersect with the following words

I think dogs are more loyal and affectionate than cats. They are always happy to see you and are willing to play with you. Cats, on the other hand, can be more independent and less needy. They are also more difficult to train and socialize with.

and the following tokens

['ĠI', 'Ġthink', 'Ġdogs', 'Ġare', 'Ġmore', 'Ġloyal', 'Ġand', 'Ġaffectionate', 'Ġthan', 'Ġcats', '.', 'ĠThey', 'Ġare', 'Ġalways', 'Ġhappy', 'Ġto', 'Ġsee', 'Ġyou', 'Ġand', 'Ġare', 'Ġwilling', 'Ġto', 'Ġplay', 'Ġwith', 'Ġyou', '.', 'ĠCats', ',', 'Ġon', 'Ġthe', 'Ġother', 'Ġhand', ',', 'Ġcan', 'Ġbe', 'Ġmore', 'Ġindependent', 'Ġand', 'Ġless', 'Ġneedy', '.', 'ĠThey', 'Ġare', 'Ġalso', 'Ġmore', 'Ġdifficult', 'Ġto', 'Ġtrain', 'Ġand', 'Ġsocialize', 'Ġwith', '.']

Maybe there is some problem with the no_repeat_ngram_size parameter here?

I use following packages:

  • torch==2.0.1
  • transformers==4.31.0
  • ctranslate2==3.20.0

kliffeup avatar Sep 26 '23 10:09 kliffeup