Single-node Multi-GPU pipeline execution
Hello, all. Thank you for the amazing project.
My computer has two 4090s and I am having trouble running it on both of them. The instructions for multi-node doesn't seem to have worked, but I am not running with s3. All the files are local in disk.
The second GPU gets populated with model weights (I checked nvtop), but usage stays at 0% and keep logging the same thing:
2025-04-14 15:06:05,041 - __main__ - INFO - Queue remaining: 674
2025-04-14 15:06:05,041 - __main__ - INFO -
Metric Name Lifetime (tokens/sec) Recently (tokens/sec)
----------------------------------------------------------------------------------
2025-04-14 15:06:05,041 - __main__ - INFO -
Worker ID | started
----------+--------
0 | 25
How to make it use the second GPU?
same problem here
Assuming you are using two cards, add the following in the cmd = [] line of row olmocr/pipline.py 505: "--tp", "2", "--enable-p2p-check",For more content, please refer to sglang and modify it to a more general method.
cmd = [
"python3",
"-m",
"sglang.launch_server",
"--model-path",
model_name_or_path,
"--chat-template",
"--tp", "2", "--enable-p2p-check",# <---------------
args.model_chat_template,
# "--context-length", str(args.model_max_context), # Commented out due to crashes
"--port",
str(SGLANG_SERVER_PORT),
"--log-level-http",
"warning",
]
Yes, as @Xu-pixel mentioned, adding --tp or --dp will do the job.