marker Add support for vLLM as an alternative OpenAI‐compatible backend

Description:
We’d like Marker to be able to route its LLM calls either to OpenAI’s API or to a local/offline vLLM engine. This issue proposes two integration patterns—HTTP drop-in (via vllm serve) and direct Python API—and an optional hybrid factory so that users can choose at runtime.

Problem Statement

Current state: Marker’s OpenAIService is hard-wired to talk to https://api.openai.com/v1 via the openai Python client.
Desire: Allow users to run vLLM (e.g. Qwen2.5-1.5B-Instruct) locally—either as a drop-in HTTP server or via its Python API—without changing any other Marker code or prompts.

Proposed Solutions

1. HTTP “drop-in” (zero code changes to Marker)

Run

vllm serve Qwen/Qwen2.5-1.5B-Instruct \
    --host 0.0.0.0 --port 8000 \
    --generation-config vllm

In your marker.yaml (or env), set:

OPENAI_BASE_URL: http://localhost:8000/v1
OPENAI_API_KEY: ANY_NONEMPTY_STRING
OPENAI_MODEL: Qwen/Qwen2.5-1.5B-Instruct

Marker’s OpenAIService will automatically talk to vLLM as if it were OpenAI.

2. Direct Python integration via a new `VLLMService`

Create a subclass of BaseService that:
- Instantiates vllm.LLM(model=…) once
- Accepts prompts, calls engine.generate([prompt], SamplingParams(…))
- Parses outputs[0].outputs[0].text into your Pydantic schema
- Updates block metadata

Wire it into your service‐factory or DI container:

def make_service(backend: str, **cfg):
    return VLLMService(**cfg) if backend == "vllm" else OpenAIService(**cfg)

3. Hybrid (optional)

Allow a single Service class to switch per request based on config:

if use_vllm:
    client = OpenAI(api_key="x", base_url="http://localhost:8000/v1")
else:
    client = OpenAI(api_key=self.key, base_url=self.openai_base_url)

Implementation Plan

Config
- Add a top-level LLM_BACKEND enum: ["openai","vllm_http","vllm_python"].
HTTP path
- Document “how to run vllm serve” in README.
- Verify OpenAIService against http://localhost:8000 endpoints.
Python path
- Implement VLLMService(BaseService).
- Write unit tests mocking vllm.LLM to verify JSON parsing, retry logic.
Factory changes
- Update the service‐factory to choose based on LLM_BACKEND.
Docs + Examples
- Add a “vLLM backend” section to the Marker docs with config snippets.
Release
- Bump version, announce in changelog.

Tasks

[ ] Define LLM_BACKEND config enum
[ ] Add README section on vllm serve usage
[ ] Write VLLMService class
[ ] Update service factory / DI wiring
[ ] Coverage tests for both HTTP & Python paths
[ ] Update documentation and examples
[ ] Cut a patch release

Feel free to adjust scope or split into sub-issues!

Apr 27 '25 18:04 SaiMadhusudan

The reason for doing this is using the concurrency , multi gpu , multi node , batching properties of vllm library

Apr 27 '25 18:04 SaiMadhusudan

Thanks for the detailed issue report and for suggesting possible fixes @SaiMadhusudan!

The existing OpenAIService already supports any endpoint which is compatible with the OpenAI API spec, including vllm serve. As with any configuration options in marker, you can also specify the base url of the OpenAIService to get your desired functionality. For example -

marker_single PDF_PATH --use_llm --llm_service marker.services.openai.OpenAIService --openai_api_key EMPTY --openai_base_url https://localhost:8000/v1 --openai_model Qwen/Qwen2.5-1.5B-Instruct

or

from marker.models import create_model_dict
from marker.converters.pdf import PdfConverter
from marker.config.parser import ConfigParser

models = create_model_dict()
parser = ConfigParser({
    'output_format': 'markdown',
    'use_llm': True,
    # Remaining options here
})

converter = PdfConverter(
    config=config_parser.generate_config_dict(),
    artifact_dict=models,
    processor_list=config_parser.get_processors(),
    renderer=config_parser.get_renderer(),
    llm_service=config_parser.get_llm_service()
)

Hope this helps!

May 05 '25 19:05 tarun-menta

Got it

May 06 '25 16:05 SaiMadhusudan

but i want to use vllm in marker can you please give me full code in google colab which i can use in my code and learn i have groq api key

Nov 10 '25 12:11 rudraaa0012-web

Add support for vLLM as an alternative OpenAI‐compatible backend

Problem Statement

Proposed Solutions

1. HTTP “drop-in” (zero code changes to Marker)

2. Direct Python integration via a new VLLMService

3. Hybrid (optional)

Implementation Plan

Tasks

2. Direct Python integration via a new `VLLMService`