marker Images Missing in Markdown Output When Using `marker` Command for Multiple PDFs

Description

When using the marker command to convert multiple PDFs to markdown in batch mode, the output markdown files include the extracted text but do not include images. However, when using the marker_single command to convert a single PDF, both text and images are included correctly in the output. This indicates a bug specific to the batch processing functionality of the marker command.

Environment

Operating System: Windows 11 Home Single Language
OS Version: 10.0.26100 Build 26100
System Model: ROG Strix G16 G614JIR
CPU: Intel(R) Core(TM) i9-14900HX (24 cores, 32 logical processors, 3.66 GHz)
GPU: NVIDIA GeForce RTX 4070 Laptop GPU
VRAM: 8.0 GB dedicated, 15.8 GB total
CUDA Version: 11.8 (as indicated by torch 2.6.0+cu118)
PyTorch Version: 2.6.0+cu118
Marker Version: v1.6.1

Steps to Reproduce

Create a folder (e.g., input_folder) containing multiple PDFs with images (e.g., pdf1.pdf, pdf2.pdf).

Run the marker command for batch conversion:

marker --output_dir .\output_folder .\input_folder  --workers 4

Inspect the markdown files in output_folder. The text is present, but images are missing.

For comparison, run the marker_single command on one of the PDFs:

marker_single .\input_folder\pdf1.pdf --output_dir output_folder_single

The output from marker_single includes both text and images as expected.

Expected Behavior

The marker command should generate markdown files that include both text and images from the PDFs, consistent with the behavior of marker_single.

Actual Behavior

When using marker for batch processing, the markdown files contain only text, with no images included. In contrast, marker_single correctly includes both text and images when processing a single PDF.

Possible Cause

The issue might stem from how the marker command handles multiprocessing with CUDA-enabled systems. In the convert.py script, models are loaded in the main process and shared across worker processes, which may not properly support image extraction due to CUDA context requirements. The marker_single command, running in a single process, avoids this problem by loading and using models directly.

Additional Information

No error messages appear during the conversion; the process completes successfully but omits images.
The issue occurs consistently, regardless of the number of PDFs processed.
Hardware and software details (listed above) may help identify if this is specific to certain GPU or CUDA configurations.

Hardware and Software Details

GPU Model: NVIDIA GeForce RTX 4070 Laptop GPU
VRAM: 8.0 GB dedicated, 15.8 GB total
CUDA Version: 11.8 (as indicated by torch 2.6.0+cu118)
PyTorch Version: 2.6.0+cu118
Marker Version: v1.6.1

Mar 14 '25 21:03 Saketh-Chandra

Not sure where the bug is yet, but I observe that when modifying marker/scripts/convert.py by removing the global variable, the bug disappears.

        converter = converter_cls(
            config=config_dict,
            # artifact_dict=model_refs,
            artifact_dict=create_model_dict(),
            processor_list=config_parser.get_processors(),
            renderer=config_parser.get_renderer(),
            llm_service=config_parser.get_llm_service()
        )

Conclusion: something about torch multiprocessing for the model_dict not working on certain machines (possibly platform-dependent?)

Edit: this fixed it for me, and also explains the observed platform dependence.

    if settings.TORCH_DEVICE == "mps" or settings.TORCH_DEVICE_MODEL == "mps":
        model_dict = None
    else:
        model_dict = None
        # create_model_dict()
        # for k, v in model_dict.items():
        #     v.model.share_memory()

Mar 25 '25 04:03 conjuncts

I have the exactly the same behavior described by @Saketh-Chandra using marker CLI for multiple pdf. The images are not treated and the final rendering is very bad. It works fine with marker_single command. I tried some fix of @conjuncts but it did not solved the problem.

Operating System: Windows 11 Professionnel OS Version: 24H2 (1000.26100.54.0) System Model: Asus ROG Strix Z890-E CPU: Intel(R) Core(TM) Ultra 9 285K 3.70 GHz (24 cores) GPU: NVIDIA GeForce RTX 4060TI Laptop GPU VRAM: 16 GB CUDA Version: 11.8 (as indicated by torch 2.6.0+cu118) PyTorch Version: 2.6.0+cu118 Marker Version: v1.6.1

Apr 06 '25 08:04 clevesim