guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Pattern Guides (LLaMA-7B)

Open dwahdany opened this issue 1 year ago • 0 comments

Pattern guides/regex patterns don't seem to have quite the impact one would expect

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Neko-Institute-of-Science/LLaMA-7B-HF")
model = AutoModelForCausalLM.from_pretrained("Neko-Institute-of-Science/LLaMA-7B-HF")
llama = guidance.llms.Transformers(model=model, tokenizer=tokenizer, device=5)
statement_gen = guidance("""
Today we want to say that our new tech company with the name {{gen 'companyname' max_tokens=10 pattern='[a-zA-Z]{8}'}} went public under the ticker {{gen 'ticker' pattern='[A-Z]{4}' temperature=0.3}}
""")
statement_gen(llm=llama)

I would expect this to output

Today we want to say that our new tech company with the name NameOfCompany went public under the ticker TCKR

where NameOfCompany matches [a-zA-Z]{8}, i.e. consisting of letters, no whitespace and exactly eight characters.

Actual output:

Today we want to say that our new tech company with the name ofTechno 2000 is going went public under the ticker TKOO

, i.e. companyname = ofTechno 2000 is going (bad) and ticker = TKOO (good)

Do I misunderstand pattern guides or is this an issue?

dwahdany avatar May 17 '23 12:05 dwahdany