unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

enhancement: `partitoin_pdf()` skip unnecessary element sorting

Open christinestraub opened this issue 2 months ago • 0 comments

This PR aims to skip element sorting when determining whether embedded text can be extracted. The extracted elements in this step are returned as final elements only for the fast strategy pipeline and are never used for other strategy pipelines (hi_res, ocr). Removing element sorting in this step and adding it to the fast strategy pipeline later will improve performance and reduce execution time.

Summary

  • skip element sorting when determining whether embedded text can be extracted.
  • add _partition_pdf_with_pdfparser() function for fast` strategy pipeline

Testing

CI should pass.

christinestraub avatar May 15 '24 21:05 christinestraub