unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

feat/Use local model for hi_res partition

Open AntoninLeroy opened this issue 3 months ago • 9 comments

Hello,

Maybe this feature already exist but I didn't manage to implement it. I work on a network that blocks huggingface and I would like to run:

elements = partition_pdf(filename=PDF_PATH, strategy='hi_res', infer_table_structure=True)

But the function cannot run because it's trying to access the yolox model on the hub:

SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /unstructuredio/yolo_x_layout/resolve/main/yolox_l0.05.onnx (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))"), '(Request ID: 757ef56e-88d9-4a7a-88ef-ff3fade2139c)')

My question is: If I manage to download the model on my machine somehow, how can use it with the ustructured library without having to call the https request ?

I hope my explainations are somehow ok.

Thanks in advance.

AntoninLeroy avatar Mar 11 '24 13:03 AntoninLeroy