unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

Unable to parse single-column csv files

Open cwang opened this issue 4 months ago • 0 comments

Describe the bug Given a single column csv file (see one example as attached), it fails to parse it because of the failure of determining the delimiter. See https://github.com/Unstructured-IO/unstructured/blob/4096a38371bae062832b976dc7ebff4184b7991f/unstructured/partition/csv.py#L109 for where the issue is.

To Reproduce Use the file attach to be parsed.

Expected behavior It should be parsed successfully and treated as a single-column spreadsheet.

Screenshots If applicable, add screenshots to help explain your problem.

Environment Info unstructured.io 0.11.8 Python 3.10

Additional context Here's a 16yo Python bug that's relevant to the sniff function used here for the expected behaviour, https://bugs.python.org/issue2078: test_single_column.csv

cwang avatar Mar 06 '24 15:03 cwang