unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

'hi_res' and 'fast' strategies taking more time than expected for larger files

Open sanju-arch opened this issue 4 months ago • 0 comments

  1. Unable to process large files (like 'covid19treatmentguidelines2.pdf' attached below) in less time. Taking time of around 20 mins to process it.
from unstructured.partition.pdf import partition_pdf
elements = partition_pdf(file_path, strategy="hi_res")
  1. Model 'yolox_quantized' is not running faster as expected(or as explained in the documentation
elements = partition(filename=filename,
                     strategy="hi_res",
                     hi_res_model_name="yolox")

Versions used for the above scenario: unstructured-inference==0.7.24 unstructured==0.12.4 pillow-heif==0.15.0

File : covid19treatmentguidelines2.pdf

sanju-arch avatar Feb 16 '24 10:02 sanju-arch