csvkit Show sniffed delimiter on exception

# colA,colB
# aaaaa...aaaaa zzzzz...zzzzz  \
# ...                           } 10 or 100 rows
# aaaaa...aaaaa zzzzz...zzzzz  /
#
# \___________/ \___________/
#  1000chars     1000chars

# 10 rows
# "," is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(10)]" | csvstat
# => ok

# 100 rows
# " " is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(100)]" | csvstat
# => Row 0 has 3 values, but Table only has 2 columns.

In the latter case, sample is trimmed, losing the header colA,colB, thus white space " " is used as the delimiter.

It was tough for me to figure out this behavior. So how about showing "what delimiter is used" in:

Debug output

$ csvstat -v ...
inferred delimiter: ' '

Error message

$ csvstat -v ...
Row 0 has 3 values, but Table only has 2 columns (delimiter: ' ').

and, how about showing warning of excessing SNIFF_LIMIT?:

$ csvstat -v ...
warning: input (XXX bytes) exceeds SNIFF_LIMIT (YYY bytes), delimiter guessing may be incorrect (NOTE: SNIFF_LIMIT can be changed by -y flag)
warning: guessed delimiter: ' '
Row 0 has 3 values, but Table only has 2 columns.

Jan 09 '19 09:01 wataash

Thanks - we'll try to do this as part of the next version.

Feb 07 '19 13:02 jpmckinney

Hmm, agate raises ValueError for "Row 0 has 3 values, but Table only has 2 columns." type errors in agate/table/__init__.py. We'd have to introduce a new error class (subclass'ing ValueError, in case anyone catches these). We'd also have to handle it all over the place, because we need access to the reader to print the dialect.

Debug output

This is a good idea. As above, we'd have to add it in a lot of places. Happy to merge a PR!

and, how about showing warning of excessing SNIFF_LIMIT?:

The snifflimit was reduced in 1.0.7 to avoid sniffing huge files (which is very slow). So, this warning would now be emitted too frequently to be useful.

Oct 18 '23 02:10 jpmckinney

csvkit csvkit copied to clipboard

Show sniffed delimiter on exception

csvkit
csvkit copied to clipboard