Issue: seed=-1 in llama_cpp.Llama Does Not Ensure Randomness
When using llama_cpp.Llama with seed=-1, the generated output remains identical across multiple runs, despite expectations that -1 should introduce randomness. Even after modifying sampling parameters (temperature, top_k, top_p) and restarting the script, the model continues to produce the same structured content.
Steps to Reproduce:
- Load a GGUF model using llama_cpp.Llama with seed=-1.
- Use Outlines’ generate.json() with a structured schema.
- Run the script multiple times and compare outputs.
- Modify sampling settings (e.g., temperature=1.2, top_k=80, top_p=0.7), but observe little to no change in output content.
- Even after restarting the script or system, the issue persists.
Expected Behavior: Each run should produce unique exam content when using seed=-1, assuming it enables true randomness.
Observed Behavior: The generated output remains unchanged across runs, with only minor formatting differences (e.g., whitespace variations).
Possible Workarounds Attempted (Without Success):
- Explicitly setting
seed=random.randint(0, 2**32 - 1). - Tweaking the input prompt dynamically.
- Increasing sampling randomness with
top_k,top_p, andtemperature. - Restarting the script/system to clear potential caches.
Here's the code:
from outlines import models, generate, samplers
from llama_cpp import Llama
import os
import json
from pydantic import BaseModel, Field, field_validator
from typing import List, Literal
class Question(BaseModel):
question_text: str
options: List[str] = Field(..., min_length=4, max_length=4)
correct_option: int = Field(..., ge=0, le=3)
@field_validator("options")
def check_options_length(cls, v):
if len(v) != 4:
raise ValueError("Each question must have exactly 4 options")
return v
@field_validator("correct_option")
def check_correct_option(cls, v, values):
options = values.data.get("options", [])
if v not in range(len(options)):
raise ValueError("correct_option must be an integer between 0 and 3")
return v
class Section(BaseModel):
section: Literal[1, 2, 3, 4, 5, 6]
section_name: Literal[
"Cloze Grammar Vocabulary",
"Cloze Contextual Vocabulary",
"Best Arrangement of Utterances",
"Cloze Informational Comprehension",
"Reading Comprehension",
"Reading Comprehension Advanced",
]
passage_text: str
questions: List[Question]
class ExamSchema(BaseModel):
sections: List[Section] = Field(..., min_length=6, max_length=6)
exam_schema_json = json.dumps(ExamSchema.model_json_schema())
# Load the Llama model with improved sampling
llm = Llama(
model_path=os.path.join(os.getcwd(), "src", "models", "Mistral-7B-Instruct-v0.3.Q4_K_M.gguf"),
n_threads=8,
n_gpu_layers=0,
seed=-1,
)
model = models.LlamaCpp(llm)
sampler = samplers.multinomial(1, temperature=1.0)
generator = generate.json(model, exam_schema_json, sampler)
exam_stream = generator.stream(
"You are an English teacher preparing an exam for Vietnamese students. "
"Ensure the questions cover a variety of topics and difficulty levels. "
"Each question must be unique and well-structured.\nOutput:",
max_tokens=None,
stop_at=["Q:", "\n"],
)
for stream in exam_stream:
print(stream)
The first output stream:
{
"
sections
":
[
{
"
section
":
1
,
"
section
name
":
"
Read
ing
Com
pre
hens
ion
",
"
pass
age
text
":
"
Now
adays
,
a
big
change
is
taking
place
in
the
way
we
write
and
consume
stories
.
E
The second output stream:
{
"
sections
":
[
{
"
section
":
1
,
"
section
_
name
":
"
Read
ing
Com
pre
hens
ion
",
"
pass
age
_
text
":
"
Now
adays
,
a
big
change
is
taking
place
in
the
way
we
› write
^R
and
consume
stories
Both outputs contain the phrase: “Nowadays, a big change is taking place in the way we write and consume stories...”
The model is Mistral 7B Instruct v0.3 Q4_K_M GGUF from https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF.