unstructured
unstructured copied to clipboard
enhancement: `partitoin_pdf()` skip unnecessary element sorting
This PR aims to skip element sorting when determining whether embedded text can be extracted. The extracted elements in this step are returned as final elements only for the fast
strategy pipeline and are never used for other strategy pipelines (hi_res
, ocr
).
Removing element sorting in this step and adding it to the fast
strategy pipeline later will improve performance and reduce execution time.
Summary
- skip element sorting when determining whether embedded text can be extracted.
- add
_partition_pdf_with_pdfparser()
function for fast` strategy pipeline
Testing
CI should pass.