csv-reconcile icon indicating copy to clipboard operation
csv-reconcile copied to clipboard

CSV sniffer needs more data

Open dbro opened this issue 2 years ago • 2 comments

Hello- I have a TSV file with line character counts as follows (the first line is the header)

155 130 656 416 707 950 526 753 186 731 ...

csv-reconcile init gives me the following error:

$ poetry run csv-reconcile init test7.tsv col1_name col2_name

... File "/home/me/src/csv-reconcile/csv_reconcile/initdb.py", line 88, in init_db searchidx = header.index(searchcol) ^^^^^^^^^^^^^^^^^^^^^^^ ValueError: 'col2_name' is not in list

The error is fixed if I change the amount of data being fed to the sniffer on this line dialect = csv.Sniffer().sniff(csvfile.read(10240))

where I changed the previous value of 1024 to be 10240.

dbro avatar Oct 06 '23 14:10 dbro

Thanks. I’ll have a look at making this configurable. Would it be possible to upload that file so I can test it on your data? Definitely not necessary but it would be nice to be able to confirm.

gitonthescene avatar Oct 06 '23 20:10 gitonthescene

Here is a file that fails with sniffer parameter 1024, and succeeds with 10240 test-not-working3.tsv.gz

$ poetry run csv-reconcile init --scorer=dice test-not-working3.tsv public_identifier match_string

dbro avatar Oct 07 '23 23:10 dbro