unstructured
unstructured copied to clipboard
bug/pdf-splitting-not-case-sensitive
Describe the bug When a file extension is .PDF instead of .pdf, file splitting does not work as intended
INFO: Preparing to split document for partition.
WARNING: Given file doesn't have '.pdf' extension. Continuing without splitting.
WARNING: File could not be split. Partitioning without split.
parsing data/Douglas Corp/Hazardous Waste/Remedial Investigation Report - 5-14-2010 - DOUGLAS CORP PLATING DIVISION - RCRA1186.PDF
To Reproduce take a pdf file and give extension .PDF process file setting split_pdfs=true
Expected behavior PDF file is split
Screenshots If applicable, add screenshots to help explain your problem.
Environment Info
Please run python scripts/collect_env.py
and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Additional context Add any other context about the problem here.