instructor icon indicating copy to clipboard operation
instructor copied to clipboard

[FEATURE] Support Anthropic's citation feature for structured outputs

Open innicoder opened this issue 7 months ago • 2 comments

Description

Is your feature request related to a problem? Please describe.

I'm extracting data from documents using instructor + Anthropic. I need to prove where each piece of data came from. Without citations, I can't verify or audit my extractions.

Describe the solution you'd like

Enable Anthropic's citation feature in instructor.

When I extract a contract's details, I want to see exactly which sentence provided each field:

# Enable citations
response = client.messages.create(
    model="claude-3-5-sonnet",
    response_model=Contract,
    citations=True
)

# See where each field came from
print(response.company_name)  # "Acme Corp"
print(response.citations)    # "The agreement between Acme Corp..." [chars 24-33]

Describe alternatives you've considered

  1. Use Anthropic's API directly - loses instructor's structure
  2. Track sources manually - unreliable
  3. Build custom solution - reinventing the wheel

Additional context

Essential for: legal docs, medical records, financial reports, research papers.

Anthropic already supports this: https://docs.anthropic.com/en/docs/build-with-claude/citations

We just need instructor to expose it.

innicoder avatar Aug 24 '25 02:08 innicoder

This should already work if you use the `` or create with Completion API

jxnl avatar Aug 24 '25 02:08 jxnl

Hey @jxnl thanks for the quick response, can you provide me with an example, I already tried it and it didn't work, here's my current setup.

Main reason I see it not working, for the citation thing we need to have messageblocks not ToolUseBlock

from typing import Any, cast

import instructor
from anthropic import Anthropic
from pydantic.main import BaseModel


client = instructor.from_anthropic(Anthropic())


class Color(BaseModel):
    grass_color: str
    sky_color: str

# note that client.chat.completions.create will also work
response = client.messages.create_with_completion(
    model="claude-sonnet-4-0",
    max_tokens=20_480,
    temperature=1,
    messages=cast(Any, [
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "text",
                        "media_type": "text/plain",
                        "data": "The grass is green. The sky is blue.",
                    },
                    "title": "My Document",
                    "context": "This is a trustworthy document.",
                    "citations": {"enabled": True},
                },
                {"type": "text", "text": "What color is the grass and sky?"},
            ],
        }
    ]),
    response_model=Color
)

print(response)

innicoder avatar Aug 24 '25 12:08 innicoder