Add support for vLLM as an alternative OpenAI‐compatible backend
Description:
We’d like Marker to be able to route its LLM calls either to OpenAI’s API or to a local/offline vLLM engine. This issue proposes two integration patterns—HTTP drop-in (via vllm serve) and direct Python API—and an optional hybrid factory so that users can choose at runtime.
Problem Statement
- Current state: Marker’s
OpenAIServiceis hard-wired to talk tohttps://api.openai.com/v1via theopenaiPython client. - Desire: Allow users to run vLLM (e.g. Qwen2.5-1.5B-Instruct) locally—either as a drop-in HTTP server or via its Python API—without changing any other Marker code or prompts.
Proposed Solutions
1. HTTP “drop-in” (zero code changes to Marker)
- Run
vllm serve Qwen/Qwen2.5-1.5B-Instruct \ --host 0.0.0.0 --port 8000 \ --generation-config vllm - In your
marker.yaml(or env), set:OPENAI_BASE_URL: http://localhost:8000/v1 OPENAI_API_KEY: ANY_NONEMPTY_STRING OPENAI_MODEL: Qwen/Qwen2.5-1.5B-Instruct - Marker’s
OpenAIServicewill automatically talk to vLLM as if it were OpenAI.
2. Direct Python integration via a new VLLMService
- Create a subclass of
BaseServicethat:- Instantiates
vllm.LLM(model=…)once - Accepts prompts, calls
engine.generate([prompt], SamplingParams(…)) - Parses
outputs[0].outputs[0].textinto your Pydantic schema - Updates block metadata
- Instantiates
- Wire it into your service‐factory or DI container:
def make_service(backend: str, **cfg): return VLLMService(**cfg) if backend == "vllm" else OpenAIService(**cfg)
3. Hybrid (optional)
Allow a single Service class to switch per request based on config:
if use_vllm:
client = OpenAI(api_key="x", base_url="http://localhost:8000/v1")
else:
client = OpenAI(api_key=self.key, base_url=self.openai_base_url)
Implementation Plan
- Config
- Add a top-level
LLM_BACKENDenum:["openai","vllm_http","vllm_python"].
- Add a top-level
- HTTP path
- Document “how to run vllm serve” in README.
- Verify
OpenAIServiceagainsthttp://localhost:8000endpoints.
- Python path
- Implement
VLLMService(BaseService). - Write unit tests mocking
vllm.LLMto verify JSON parsing, retry logic.
- Implement
- Factory changes
- Update the service‐factory to choose based on
LLM_BACKEND.
- Update the service‐factory to choose based on
- Docs + Examples
- Add a “vLLM backend” section to the Marker docs with config snippets.
- Release
- Bump version, announce in changelog.
Tasks
- [ ] Define
LLM_BACKENDconfig enum - [ ] Add README section on
vllm serveusage - [ ] Write
VLLMServiceclass - [ ] Update service factory / DI wiring
- [ ] Coverage tests for both HTTP & Python paths
- [ ] Update documentation and examples
- [ ] Cut a patch release
Feel free to adjust scope or split into sub-issues!
The reason for doing this is using the concurrency , multi gpu , multi node , batching properties of vllm library
Thanks for the detailed issue report and for suggesting possible fixes @SaiMadhusudan!
The existing OpenAIService already supports any endpoint which is compatible with the OpenAI API spec, including vllm serve. As with any configuration options in marker, you can also specify the base url of the OpenAIService to get your desired functionality. For example -
marker_single PDF_PATH --use_llm --llm_service marker.services.openai.OpenAIService --openai_api_key EMPTY --openai_base_url https://localhost:8000/v1 --openai_model Qwen/Qwen2.5-1.5B-Instruct
or
from marker.models import create_model_dict
from marker.converters.pdf import PdfConverter
from marker.config.parser import ConfigParser
models = create_model_dict()
parser = ConfigParser({
'output_format': 'markdown',
'use_llm': True,
# Remaining options here
})
converter = PdfConverter(
config=config_parser.generate_config_dict(),
artifact_dict=models,
processor_list=config_parser.get_processors(),
renderer=config_parser.get_renderer(),
llm_service=config_parser.get_llm_service()
)
Hope this helps!
Got it
but i want to use vllm in marker can you please give me full code in google colab which i can use in my code and learn i have groq api key