Rate limit also occurs for local LLM?

Open 17Reset opened this issue 9 months ago • 0 comments

I'm using the gemma-3-27b-it model that VLLM is running on the backend, specifying openai_base_url as the api interface for VLLM in the marker.

Marker command:

NUM_DEVICES=4 NUM_WORKERS=8 marker_single ug901-vivado-synthesis.pdf --output_dir output --output_format markdown --debug --strip_existing_ocr --disable_image_extraction --use_llm --llm_service=marker.services.openai.OpenAIService --openai_base_url=http://0.0.0.0:30000/v1 --openai_api_key="None" --openai_model="/mnt/Model/Google/gemma-3-27b-it"

Running Log:

xlab@xlab:/mnt/Agent/Marker$ ./marker_run.sh
Loaded layout model s3://layout/2025_02_18 on device cuda with dtype torch.float16
Loaded texify model s3://texify/2025_02_18 on device cuda with dtype torch.float16
Loaded recognition model s3://text_recognition/2025_02_18 on device cuda with dtype torch.float16
Loaded table recognition model s3://table_recognition/2025_02_18 on device cuda with dtype torch.float16
Loaded detection model s3://text_detection/2025_02_28 on device cuda with dtype torch.float16
Loaded detection model s3://inline_math_detection/2025_02_24 on device cuda with dtype torch.float16
Recognizing layout: 100%|████████████████████████████████████████████████████████████████| 50/50 [00:21<00:00,  2.28it/s]
LLM layout relabelling: 123it [02:00,  1.02it/s]
Running OCR Error Detection: 100%|██████████████████████████████████████████████████████| 75/75 [00:00<00:00, 296.21it/s]
Detecting bboxes: 100%|██████████████████████████████████████████████████████████████████| 75/75 [00:26<00:00,  2.81it/s]
Detecting bboxes: 100%|██████████████████████████████████████████████████████████████████| 75/75 [00:25<00:00,  2.93it/s]
Detecting bboxes: 100%|████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  3.40it/s]
Recognizing Text: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:09<00:00,  2.00it/s]
Recognizing tables: 100%|████████████████████████████████████████████████████████████████| 13/13 [00:08<00:00,  1.58it/s]
LLMTableProcessor running: 42it [02:20,  4.12s/it]Rate limit error: Request timed out.. Retrying in 3 seconds... (Attempt 1/2)
LLMTableProcessor running: 61it [03:59,  5.34s/it]Rate limit error: Request timed out.. Retrying in 6 seconds... (Attempt 2/2)
LLMTableProcessor running: 63it [04:07,  3.93s/it]
LLMTableMergeProcessor running: 0it [00:00, ?it/s]Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
LLMTableMergeProcessor running: 1it [00:00,  2.57it/s]Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
LLMTableMergeProcessor running: 4it [00:00,  9.16it/s]Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
LLMTableMergeProcessor running: 7it [00:00, 11.02it/s]
LLM processors running: 100%|██████████████████████████████████████████████████████████| 155/155 [02:57<00:00,  1.14s/it]

Apr 02 '25 02:04 17Reset