unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

bug/pdf-splitting-not-case-sensitive

Open jeremydiba opened this issue 6 months ago • 0 comments

Describe the bug When a file extension is .PDF instead of .pdf, file splitting does not work as intended

INFO: Preparing to split document for partition.
WARNING: Given file doesn't have '.pdf' extension. Continuing without splitting.
WARNING: File could not be split. Partitioning without split.
parsing data/Douglas Corp/Hazardous Waste/Remedial Investigation Report - 5-14-2010 - DOUGLAS CORP PLATING DIVISION - RCRA1186.PDF

To Reproduce take a pdf file and give extension .PDF process file setting split_pdfs=true

Expected behavior PDF file is split

Screenshots If applicable, add screenshots to help explain your problem.

Environment Info Please run python scripts/collect_env.py and paste the output here. This will help us understand more about the environment in which the bug occurred.

Additional context Add any other context about the problem here.

jeremydiba avatar Aug 05 '24 21:08 jeremydiba