CSV sniffer needs more data
Hello- I have a TSV file with line character counts as follows (the first line is the header)
155 130 656 416 707 950 526 753 186 731 ...
csv-reconcile init gives me the following error:
$ poetry run csv-reconcile init test7.tsv col1_name col2_name
... File "/home/me/src/csv-reconcile/csv_reconcile/initdb.py", line 88, in init_db searchidx = header.index(searchcol) ^^^^^^^^^^^^^^^^^^^^^^^ ValueError: 'col2_name' is not in list
The error is fixed if I change the amount of data being fed to the sniffer on this line dialect = csv.Sniffer().sniff(csvfile.read(10240))
where I changed the previous value of 1024 to be 10240.
Thanks. I’ll have a look at making this configurable. Would it be possible to upload that file so I can test it on your data? Definitely not necessary but it would be nice to be able to confirm.
Here is a file that fails with sniffer parameter 1024, and succeeds with 10240 test-not-working3.tsv.gz
$ poetry run csv-reconcile init --scorer=dice test-not-working3.tsv public_identifier match_string