unstructured
unstructured copied to clipboard
Unable to parse single-column csv files
Describe the bug Given a single column csv file (see one example as attached), it fails to parse it because of the failure of determining the delimiter. See https://github.com/Unstructured-IO/unstructured/blob/4096a38371bae062832b976dc7ebff4184b7991f/unstructured/partition/csv.py#L109 for where the issue is.
To Reproduce Use the file attach to be parsed.
Expected behavior It should be parsed successfully and treated as a single-column spreadsheet.
Screenshots If applicable, add screenshots to help explain your problem.
Environment Info unstructured.io 0.11.8 Python 3.10
Additional context
Here's a 16yo Python bug that's relevant to the sniff
function used here for the expected behaviour, https://bugs.python.org/issue2078:
test_single_column.csv