Christine Straub

Results 6 issues of Christine Straub

**Describe the bug** Some spaces are removed from the text when partitioning a PDF document. **To Reproduce** PDF: [rok_20230930_1-1.pdf](https://github.com/Unstructured-IO/unstructured/files/15001636/rok_20230930_1-1.pdf) ``` elements = partition_pdf( filename="rok_20230930_1-1.pdf", strategy="hi_res", infer_table_structure=True, ) print(str(elements[20])) ``` **Current...

bug
pdf

This PR is a clone of PR https://github.com/Unstructured-IO/unstructured/pull/2600 to run CI / test_chipper and update ingest test fixtures.

This PR aims to skip element sorting when determining whether embedded text can be extracted. The extracted elements in this step are returned as final elements only for the `fast`...

This PR aims to pass `kwargs` through `fast` strategy pipeline, which was missing as part of the previous PR - https://github.com/Unstructured-IO/unstructured/pull/3030 ### Summary - pass `kwargs` through `fast` strategy pipeline,...