qlora icon indicating copy to clipboard operation
qlora copied to clipboard

Question regarding creation of Guanaco

Open NielsRogge opened this issue 1 year ago • 5 comments

Hi folks,

Thanks for this amazing work. I have a question regarding the fine-tuning of Guanaco. Specifically, this model was trained on this dataset: https://huggingface.co/datasets/timdettmers/openassistant-guanaco, which only contains a "text" column that contains the entire conversation for each example (i.e. multiple human - assistant - human - assistant interactions).

However in the code, it seems that when this dataset is specified, the model is trained just on these entire conversations, rather than only on the assistant completions. Is this true?

Then how is it possible that the Guanaco models don't generate several human - assistant - human etc. interactions when you prompt it? It seems like the model is capable of only generating an assistant completion after which it generates the EOS token id.

This is the code I use to generate with Guanaco:

from transformers import LlamaTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList
from peft import PeftModel 
import torch

model = AutoModelForCausalLM.from_pretrained("decapoda-research/llama-7b-hf", torch_dtype=torch.bfloat16, device_map={"": 0})
model = PeftModel.from_pretrained(model, 'timdettmers/guanaco-7b')
model = model.merge_and_unload()

tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
tokenizer.bos_token_id = 1

stop_token_ids = [0]

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
           for stop_id in stop_token_ids:
                  if input_ids[0][-1] == stop_id:
                      return True
           return False

header = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."
query = "How are you?"
prompt = f"### Human: {query}\n### Assistant:"

input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)

stop = StopOnTokens()

generate_kwargs = dict(
  input_ids=input_ids,
  max_new_tokens=1536,
  temperature=0.7,
  do_sample=True,
  top_p=0.9,
  top_k=0,
  repetition_penalty=1.1,
  stopping_criteria=StoppingCriteriaList([stop]))

outputs = model.generate(**generate_kwargs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

NielsRogge avatar Jun 16 '23 11:06 NielsRogge