Please Add suport for loading the local model file, needing using for no internet
🚀 The feature, motivation and pitch
The HF's speed of downloading too slow. so I download the olmOCR-7B-0225-preview from Modelscope, then I edit the pipeline.py code for loading local_ dir
async def download_model(model_name_or_path: str):
logger.info(f"Downloading model '{model_name_or_path}'")
# cheak the local file
if os.path.exists(model_name_or_path):
logger.info(f"Using local model at '{model_name_or_path}'")
return model_name_or_path
# dowanloading from Hugging Face Hub
try:
model_path = snapshot_download(repo_id=model_name_or_path)
logger.info(f"Model download complete '{model_name_or_path}'")
return model_path
except Exception as e:
logger.error(f"Failed to download model: {e}")
raise
Then olmocr.pipeline report alot 502 Bad Gateway
Loading safetensors checkpoint shards: 75% Completed | 3/4 [09:25<03:05, 185.75s/it]
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:23,995 - __main__ - INFO - Attempt 189: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:27,092 - __main__ - INFO - Attempt 190: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:30,188 - __main__ - INFO - Attempt 191: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:33,272 - __main__ - INFO - Attempt 192: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:36,360 - __main__ - INFO - Attempt 193: Unexpected status code 502
INFO:httpx:HTTP Request: GET http://localhost:30024/v1/models "HTTP/1.1 502 Bad Gateway"
2025-03-07 22:32:39,449 - __main__ - INFO - Attempt 194: Unexpected status code 50
Alternatives
No response
Additional context
No response
The same to you ,my solution is comment the follow line and run with a '--model' param, works for me.
command: python -m olmocr.pipeline ./localworkspace --pdfs /path/to/your.pdf --model /path/to/your/model
Hi, this repo is using hugging face to download its pre-trained models so what you need to do is cache their default cache folder, ~/.cache/huggingface/. In terms of cloud deployment, you can use network volume to keep the cache folder.