syncode icon indicating copy to clipboard operation
syncode copied to clipboard

Issues with built-in python, java, and go grammars

Open ivnle opened this issue 1 year ago • 1 comments

I am experiencing issues with the built-in grammars for Java, Python, and Go. The Python and Java grammars appear to be ignored, while the Go grammar produces mostly gibberish. Below is a script to reproduce the issue, along with example outputs. I installed with pip install git+https://github.com/uiuc-focal-lab/syncode.git. syncode version = 0.1 .

import torch
from syncode import SyncodeLogitsProcessor
from syncode import Grammar
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda'
# model_name = "meta-llama/Llama-3.2-1B-Instruct"
model_name = "meta-llama/Llama-3.1-8B-Instruct"
cache_dir = None

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, cache_dir=cache_dir).eval().to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize SynCode logits processor for the given grammar

# grammar_str = """ start: month " " day 
              
#               day: /[1-9]/ | /[1-2][0-9]/ | /3[0-1]/
              
#               month: "January" | "February" | "March" | "April" | "May" | "June" | "July" | "August" | "September" | "October" | "November" | "December"
# """
grammar_str = "python"
# grammar_str = "go"
# grammar_str = "java"

date_grammar = Grammar(grammar_str)
syncode_logits_processor = SyncodeLogitsProcessor(grammar=date_grammar, tokenizer=tokenizer, parse_output_only=True)

prompt = f"Write a {grammar_str} function that prints 'hello world' in reverse."
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
                  messages, tokenize=False, add_generation_prompt=True
            )
print("[PROMPT]", prompt, "\n")

syncode_logits_processor.reset(prompt)

inputs = tokenizer(prompt, return_tensors='pt').input_ids.to(device)

attention_mask = torch.ones_like(inputs)
output = model.generate(
      inputs,
      attention_mask=attention_mask,
      max_length=512, 
      num_return_sequences=1, 
      pad_token_id=tokenizer.eos_token_id, 
      logits_processor=[syncode_logits_processor]
      )
output_str = tokenizer.decode(output[0][len(inputs[0]):], skip_special_tokens=True)
print("[OUTPUT]", output_str)

Python

[PROMPT] <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a python function that prints 'hello world' in reverse.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

 

[OUTPUT] **Reversing 'Hello World' Function**
=====================================

Here is a simple Python function that prints 'hello world' in reverse:

```python
def print_reverse_hello_world():
    """
    Prints 'hello world' in reverse.
    """
    message = "hello world"
    reversed_message = message[::-1]
    print(reversed_message)

print_reverse_hello_world()
```

**

Java

[PROMPT] <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a java function that prints 'hello world' in reverse.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

 

[OUTPUT] interface
Here is a simple Java function that prints 'hello world' in reverse:

```java
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}
```

Explanation:

- The `System.out.println()` function is used to print the string "Hello World" to the console.
- The `public static void

Go

[PROMPT] <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a go function that prints 'hello world' in reverse.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

 

[OUTPUT] \ 
\

\

\
...

ivnle avatar Dec 19 '24 22:12 ivnle