marker icon indicating copy to clipboard operation
marker copied to clipboard

Add support for vLLM as an alternative OpenAI‐compatible backend

Open SaiMadhusudan opened this issue 8 months ago • 1 comments

Description:
We’d like Marker to be able to route its LLM calls either to OpenAI’s API or to a local/offline vLLM engine. This issue proposes two integration patterns—HTTP drop-in (via vllm serve) and direct Python API—and an optional hybrid factory so that users can choose at runtime.


Problem Statement

  • Current state: Marker’s OpenAIService is hard-wired to talk to https://api.openai.com/v1 via the openai Python client.
  • Desire: Allow users to run vLLM (e.g. Qwen2.5-1.5B-Instruct) locally—either as a drop-in HTTP server or via its Python API—without changing any other Marker code or prompts.

Proposed Solutions

1. HTTP “drop-in” (zero code changes to Marker)

  • Run
    vllm serve Qwen/Qwen2.5-1.5B-Instruct \
        --host 0.0.0.0 --port 8000 \
        --generation-config vllm
    
  • In your marker.yaml (or env), set:
    OPENAI_BASE_URL: http://localhost:8000/v1
    OPENAI_API_KEY: ANY_NONEMPTY_STRING
    OPENAI_MODEL: Qwen/Qwen2.5-1.5B-Instruct
    
  • Marker’s OpenAIService will automatically talk to vLLM as if it were OpenAI.

2. Direct Python integration via a new VLLMService

  • Create a subclass of BaseService that:
    • Instantiates vllm.LLM(model=…) once
    • Accepts prompts, calls engine.generate([prompt], SamplingParams(…))
    • Parses outputs[0].outputs[0].text into your Pydantic schema
    • Updates block metadata
  • Wire it into your service‐factory or DI container:
    def make_service(backend: str, **cfg):
        return VLLMService(**cfg) if backend == "vllm" else OpenAIService(**cfg)
    

3. Hybrid (optional)

Allow a single Service class to switch per request based on config:

if use_vllm:
    client = OpenAI(api_key="x", base_url="http://localhost:8000/v1")
else:
    client = OpenAI(api_key=self.key, base_url=self.openai_base_url)

Implementation Plan

  1. Config
    • Add a top-level LLM_BACKEND enum: ["openai","vllm_http","vllm_python"].
  2. HTTP path
    • Document “how to run vllm serve” in README.
    • Verify OpenAIService against http://localhost:8000 endpoints.
  3. Python path
    • Implement VLLMService(BaseService).
    • Write unit tests mocking vllm.LLM to verify JSON parsing, retry logic.
  4. Factory changes
    • Update the service‐factory to choose based on LLM_BACKEND.
  5. Docs + Examples
    • Add a “vLLM backend” section to the Marker docs with config snippets.
  6. Release
    • Bump version, announce in changelog.

Tasks

  • [ ] Define LLM_BACKEND config enum
  • [ ] Add README section on vllm serve usage
  • [ ] Write VLLMService class
  • [ ] Update service factory / DI wiring
  • [ ] Coverage tests for both HTTP & Python paths
  • [ ] Update documentation and examples
  • [ ] Cut a patch release

Feel free to adjust scope or split into sub-issues!

SaiMadhusudan avatar Apr 27 '25 18:04 SaiMadhusudan

The reason for doing this is using the concurrency , multi gpu , multi node , batching properties of vllm library

SaiMadhusudan avatar Apr 27 '25 18:04 SaiMadhusudan

Thanks for the detailed issue report and for suggesting possible fixes @SaiMadhusudan!

The existing OpenAIService already supports any endpoint which is compatible with the OpenAI API spec, including vllm serve. As with any configuration options in marker, you can also specify the base url of the OpenAIService to get your desired functionality. For example -

marker_single PDF_PATH --use_llm --llm_service marker.services.openai.OpenAIService --openai_api_key EMPTY --openai_base_url https://localhost:8000/v1 --openai_model Qwen/Qwen2.5-1.5B-Instruct

or

from marker.models import create_model_dict
from marker.converters.pdf import PdfConverter
from marker.config.parser import ConfigParser

models = create_model_dict()
parser = ConfigParser({
    'output_format': 'markdown',
    'use_llm': True,
    # Remaining options here
})

converter = PdfConverter(
    config=config_parser.generate_config_dict(),
    artifact_dict=models,
    processor_list=config_parser.get_processors(),
    renderer=config_parser.get_renderer(),
    llm_service=config_parser.get_llm_service()
)

Hope this helps!

tarun-menta avatar May 05 '25 19:05 tarun-menta

Got it

SaiMadhusudan avatar May 06 '25 16:05 SaiMadhusudan

but i want to use vllm in marker can you please give me full code in google colab which i can use in my code and learn i have groq api key

rudraaa0012-web avatar Nov 10 '25 12:11 rudraaa0012-web