outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Issue: seed=-1 in llama_cpp.Llama Does Not Ensure Randomness

Open thongtr-dev opened this issue 9 months ago • 1 comments

When using llama_cpp.Llama with seed=-1, the generated output remains identical across multiple runs, despite expectations that -1 should introduce randomness. Even after modifying sampling parameters (temperature, top_k, top_p) and restarting the script, the model continues to produce the same structured content.

Steps to Reproduce:

  1. Load a GGUF model using llama_cpp.Llama with seed=-1.
  2. Use Outlines’ generate.json() with a structured schema.
  3. Run the script multiple times and compare outputs.
  4. Modify sampling settings (e.g., temperature=1.2, top_k=80, top_p=0.7), but observe little to no change in output content.
  5. Even after restarting the script or system, the issue persists.

Expected Behavior: Each run should produce unique exam content when using seed=-1, assuming it enables true randomness.

Observed Behavior: The generated output remains unchanged across runs, with only minor formatting differences (e.g., whitespace variations).

Possible Workarounds Attempted (Without Success):

  • Explicitly setting seed=random.randint(0, 2**32 - 1).
  • Tweaking the input prompt dynamically.
  • Increasing sampling randomness with top_k, top_p, and temperature.
  • Restarting the script/system to clear potential caches.

Here's the code:

from outlines import models, generate, samplers
from llama_cpp import Llama
import os
import json

from pydantic import BaseModel, Field, field_validator
from typing import List, Literal


class Question(BaseModel):
    question_text: str
    options: List[str] = Field(..., min_length=4, max_length=4)
    correct_option: int = Field(..., ge=0, le=3)

    @field_validator("options")
    def check_options_length(cls, v):
        if len(v) != 4:
            raise ValueError("Each question must have exactly 4 options")
        return v

    @field_validator("correct_option")
    def check_correct_option(cls, v, values):
        options = values.data.get("options", [])
        if v not in range(len(options)):
            raise ValueError("correct_option must be an integer between 0 and 3")
        return v


class Section(BaseModel):
    section: Literal[1, 2, 3, 4, 5, 6]
    section_name: Literal[
        "Cloze Grammar Vocabulary",
        "Cloze Contextual Vocabulary",
        "Best Arrangement of Utterances",
        "Cloze Informational Comprehension",
        "Reading Comprehension",
        "Reading Comprehension Advanced",
    ]
    passage_text: str
    questions: List[Question]


class ExamSchema(BaseModel):
    sections: List[Section] = Field(..., min_length=6, max_length=6)


exam_schema_json = json.dumps(ExamSchema.model_json_schema())

# Load the Llama model with improved sampling
llm = Llama(
    model_path=os.path.join(os.getcwd(), "src", "models", "Mistral-7B-Instruct-v0.3.Q4_K_M.gguf"),
    n_threads=8,
    n_gpu_layers=0,
    seed=-1,
)

model = models.LlamaCpp(llm)

sampler = samplers.multinomial(1, temperature=1.0)

generator = generate.json(model, exam_schema_json, sampler)

exam_stream = generator.stream(
    "You are an English teacher preparing an exam for Vietnamese students. "
    "Ensure the questions cover a variety of topics and difficulty levels. "
    "Each question must be unique and well-structured.\nOutput:",
    max_tokens=None,
    stop_at=["Q:", "\n"],
)

for stream in exam_stream:
    print(stream)

The first output stream:

{
 "
sections
":
 [
 {
 "
section
":
 
1
,
 "
section

name
":
 "
Read
ing
 Com
pre
hens
ion
",
 "
pass
age

text
":
 "
Now
adays
,
 a
 big
 change
 is
 taking
 place
 in
 the
 way
 we
 write
 and
 consume
 stories
.
 E

The second output stream:

{
 "
sections
":
 [
 {
 "
section
":
 
1
,
 "
section
_
name
":
 "
Read
ing
 Com
pre
hens
ion
",
 "
pass
age
_
text
":
 "
Now
adays
,
 a
 big
 change
 is
 taking
 place
 in
 the
 way
 we
› write
^R
 and
 consume
 stories

Both outputs contain the phrase: “Nowadays, a big change is taking place in the way we write and consume stories...”

thongtr-dev avatar Mar 04 '25 18:03 thongtr-dev

The model is Mistral 7B Instruct v0.3 Q4_K_M GGUF from https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF.

thongtr-dev avatar Mar 04 '25 18:03 thongtr-dev