olmocr Please Add suport for loading the local model file, needing using for no internet

🚀 The feature, motivation and pitch

The HF's speed of downloading too slow. so I download the olmOCR-7B-0225-preview from Modelscope, then I edit the pipeline.py code for loading local_ dir

async def download_model(model_name_or_path: str):
    logger.info(f"Downloading model '{model_name_or_path}'")
    
    # cheak the local file
    if os.path.exists(model_name_or_path):
        logger.info(f"Using local model at '{model_name_or_path}'")
        return model_name_or_path
    
    # dowanloading from Hugging Face Hub 
    try:
        model_path = snapshot_download(repo_id=model_name_or_path)
        logger.info(f"Model download complete '{model_name_or_path}'")
        return model_path
    except Exception as e:
        logger.error(f"Failed to download model: {e}")
        raise

Then olmocr.pipeline report alot 502 Bad Gateway

Loading safetensors checkpoint shards:  75% Completed | 3/4 [09:25<03:05, 185.75s/it]
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:23,995 - __main__ - INFO - Attempt 189: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:27,092 - __main__ - INFO - Attempt 190: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:30,188 - __main__ - INFO - Attempt 191: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:33,272 - __main__ - INFO - Attempt 192: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:36,360 - __main__ - INFO - Attempt 193: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:39,449 - __main__ - INFO - Attempt 194: Unexpected status code 50

Alternatives

No response

Additional context

No response

Mar 07 '25 14:03 goodmaney

The same to you ,my solution is comment the follow line and run with a '--model' param, works for me.

command: python -m olmocr.pipeline ./localworkspace --pdfs /path/to/your.pdf --model /path/to/your/model

Mar 08 '25 07:03 kkyeer

Hi, this repo is using hugging face to download its pre-trained models so what you need to do is cache their default cache folder, ~/.cache/huggingface/. In terms of cloud deployment, you can use network volume to keep the cache folder.

Mar 23 '25 10:03 quytoanvu