olmocr icon indicating copy to clipboard operation
olmocr copied to clipboard

Single-node Multi-GPU pipeline execution

Open Pedrexus opened this issue 9 months ago • 2 comments

Hello, all. Thank you for the amazing project.

My computer has two 4090s and I am having trouble running it on both of them. The instructions for multi-node doesn't seem to have worked, but I am not running with s3. All the files are local in disk.

The second GPU gets populated with model weights (I checked nvtop), but usage stays at 0% and keep logging the same thing:

2025-04-14 15:06:05,041 - __main__ - INFO - Queue remaining: 674
2025-04-14 15:06:05,041 - __main__ - INFO - 
Metric Name                        Lifetime (tokens/sec)     Recently (tokens/sec)
----------------------------------------------------------------------------------
2025-04-14 15:06:05,041 - __main__ - INFO - 
Worker ID | started
----------+--------
0         | 25

How to make it use the second GPU?

Pedrexus avatar Apr 14 '25 06:04 Pedrexus

same problem here

0429ch avatar Apr 14 '25 15:04 0429ch

Assuming you are using two cards, add the following in the cmd = [] line of row olmocr/pipline.py 505: "--tp", "2", "--enable-p2p-check",For more content, please refer to sglang and modify it to a more general method.

 cmd = [
        "python3",
        "-m",
        "sglang.launch_server",
        "--model-path",
        model_name_or_path,
        "--chat-template",
 "--tp", "2", "--enable-p2p-check",# <---------------
        args.model_chat_template,
        # "--context-length", str(args.model_max_context),  # Commented out due to crashes
        "--port",
        str(SGLANG_SERVER_PORT),
        "--log-level-http",
        "warning",
    ]

Xu-pixel avatar Apr 21 '25 08:04 Xu-pixel

Yes, as @Xu-pixel mentioned, adding --tp or --dp will do the job.

aman-17 avatar Jul 10 '25 20:07 aman-17