marker [BUG: Breaking] AttributeError: 'ExtractionOutput' object has no attribute 'metadata'

🧨 Describe the Bug

Hi, so the docs wasn't clear about how to save output but I assumed I needed to use from marker.output import save_output so it works well when using PdfConverter but not when I use ExtractionConverter (I use it for structured extraction). I'm getting an error AttributeError: 'ExtractionOutput' object has no attribute 'metadata'. On top of that, all my attempts to use ollama to do structured extraction have failed while it works well with gemini but that's another issue I guess (PS: I've found this closed issue that is exactly my second issue with marker + ollama but I wonder why its closed because its still happening https://github.com/datalab-to/marker/issues/785 )

📄 Input Document

It happens with any pdf but here's a short 3 pages pdf to test. hal.pdf

📤 Output Trace / Stack Trace

Click to expand

Running page extraction: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.42s/it]
Traceback (most recent call last):
  File "/home/kp276129/Documents/ontoflow/pdf_analysis/test.py", line 32, in <module>
    save_output(rendered, output_dir=OUTPUT_DIR, fname_base="hal_extracted_structured")
  File "/nobackup/kp276129/envs/ontoflow/lib/python3.12/site-packages/marker/output.py", line 97, in save_output
    f.write(json.dumps(rendered.metadata, indent=2))
                       ^^^^^^^^^^^^^^^^^
  File "/nobackup/kp276129/envs/ontoflow/lib/python3.12/site-packages/pydantic/main.py", line 1026, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'ExtractionOutput' object has no attribute 'metadata

⚙️ Environment

Please fill in all relevant details:

Marker version: marker-pdf 1.10.1
Surya version: 0.17.0
Python version: 3.12.3
PyTorch version: 2.9.0+cu126
Transformers version: 4.57.1
Operating System :
- Distributor ID: Ubuntu
- Description: Ubuntu 24.04.3 LTS
- Release: 24.04
- Codename: noble

✅ Expected Behavior

I expected Marker to output hal_extracted_structured.json in OUTPUT_DIR without any error.

📟 Command or Code Used

Click to expand

# https://github.com/datalab-to/marker?tab=readme-ov-file#structured-extraction-beta
from pathlib import Path
from marker.models import create_model_dict
from marker.config.parser import ConfigParser
from marker.converters.extraction import ExtractionConverter
from marker.output import save_output
from templates import PaperMetadata

INPUT_DIR = Path("/home/kp276129/Documents/ontoflow/pdf_analysis/input")
OUTPUT_DIR = Path("/home/kp276129/Documents/ontoflow/pdf_analysis/output")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)


schema = PaperMetadata.model_json_schema()

config_parser = ConfigParser({
    "page_schema": schema,
    "use_llm": True,
    "disable_image_extraction": True,
    "ollama_base_url": "http://localhost:11434",
    "ollama_model": "gemma3",
    "llm_service": "marker.services.ollama.OllamaService",
})

converter = ExtractionConverter(
    artifact_dict=create_model_dict(),
    config=config_parser.generate_config_dict(),
    llm_service=config_parser.get_llm_service(),
)

rendered = converter(str(INPUT_DIR / "hal.pdf"))
save_output(rendered, output_dir=OUTPUT_DIR, fname_base="hal_extracted_structured")

Oops, forgot to include my PaperMetadata template:

from __future__ import annotations

from typing import List, Optional, Dict
from pydantic import BaseModel, Field


class Figure(BaseModel):
    """Représente une figure, un diagramme ou une image dans le document."""

    caption: Optional[str] = Field(
        None, description="La légende exacte de la figure, si elle existe."
    )
    description: str = Field(
        ...,
        description=(
            "Une description textuelle détaillée de ce que l'image montre."
        ),
    )
    page_number: Optional[int] = Field(
        None, description="Le numéro de la page où se trouve la figure."
    )


class PaperMetadata(BaseModel):
    """Modèle de métadonnées pour un article scientifique / rapport technique."""

    title: str = Field(..., description="Titre de l'article")
    authors: List[str] = Field(
        default_factory=list, description="Liste d'auteurs, ordre conservé"
    )
    affiliations: Optional[List[str]] = Field(
        default=None, description="Liste d'affiliations"
    )
    abstract: Optional[str] = Field(None, description="Résumé / abstract")
    keywords: Optional[List[str]] = Field(default=None, description="Mots-clés")
    doi: Optional[str] = Field(None, description="DOI si présent")
    publication_date: Optional[str] = Field(
        None,
        description=(
            "Date de publication (ISO 'YYYY-MM-DD' préférée). "
            "Formats acceptés: 'YYYY-MM-DD', '25 Jul 2017', 'Submitted on 25 Jul 2017' — "
        ),
    )
    journal: Optional[str] = Field(
        None, description="Nom du journal / conférence"
    )
    volume: Optional[str] = Field(None, description="Volume")
    issue: Optional[str] = Field(None, description="Numéro")
    pages: Optional[str] = Field(None, description="Pages, ex: '123-135'")

    figures: Optional[List[Figure]] = Field(
        default_factory=list,
        description="Liste de toutes les figures, diagrammes et images trouvés dans le document.",
    )

Nov 04 '25 08:11 kipavy

I think the problem was an AttributeError that occurred inside the save_output function, but only when it was trying to save the results from an ExtractionConverter. The fix, which has already been applied to the marker/output.py file, was to make the metadata-saving step conditional.

with open(...) as f: f.write(json.dumps(rendered.metadata, indent=2))

fix code: # FIX: Check if the 'metadata' attribute exists before trying to access it. # ExtractionOutput objects do not have this attribute, causing the bug. if hasattr(rendered, "metadata"): with open( os.path.join(output_dir, f"{fname_base}_meta.json"), "w+", encoding=settings.OUTPUT_ENCODING, ) as f: f.write(json.dumps(rendered.metadata, indent=2))

Nov 07 '25 05:11 gyugut

I think the problem was an AttributeError that occurred inside the save_output function, but only when it was trying to save the results from an ExtractionConverter. The fix, which has already been applied to the marker/output.py file, was to make the metadata-saving step conditional.

with open(...) as f: f.write(json.dumps(rendered.metadata, indent=2))

fix code: # FIX: Check if the 'metadata' attribute exists before trying to access it. # ExtractionOutput objects do not have this attribute, causing the bug. if hasattr(rendered, "metadata"): with open( os.path.join(output_dir, f"{fname_base}_meta.json"), "w+", encoding=settings.OUTPUT_ENCODING, ) as f: f.write(json.dumps(rendered.metadata, indent=2))

Hello, Yes that's it but is there a PR for this ? I don't even understand how this hasn't already been fixed. I can do the PR

Nov 07 '25 07:11 kipavy

I said it wrong. It's not that it's already applied. I fixed it. Good PR

Nov 07 '25 09:11 gyugut

please can you provide code with llm

Nov 08 '25 12:11 ankit8347