csvkit icon indicating copy to clipboard operation
csvkit copied to clipboard

Show sniffed delimiter on exception

Open wataash opened this issue 6 years ago • 2 comments

# colA,colB
# aaaaa...aaaaa zzzzz...zzzzz  \
# ...                           } 10 or 100 rows
# aaaaa...aaaaa zzzzz...zzzzz  /
#
# \___________/ \___________/
#  1000chars     1000chars

# 10 rows
# "," is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(10)]" | csvstat
# => ok

# 100 rows
# " " is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(100)]" | csvstat
# => Row 0 has 3 values, but Table only has 2 columns.

In the latter case, sample is trimmed, losing the header colA,colB, thus white space " " is used as the delimiter.

It was tough for me to figure out this behavior. So how about showing "what delimiter is used" in:

  1. Debug output
$ csvstat -v ...
inferred delimiter: ' '
  1. Error message
$ csvstat -v ...
Row 0 has 3 values, but Table only has 2 columns (delimiter: ' ').

and, how about showing warning of excessing SNIFF_LIMIT?:

$ csvstat -v ...
warning: input (XXX bytes) exceeds SNIFF_LIMIT (YYY bytes), delimiter guessing may be incorrect (NOTE: SNIFF_LIMIT can be changed by -y flag)
warning: guessed delimiter: ' '
Row 0 has 3 values, but Table only has 2 columns.

wataash avatar Jan 09 '19 09:01 wataash

Thanks - we'll try to do this as part of the next version.

jpmckinney avatar Feb 07 '19 13:02 jpmckinney

Hmm, agate raises ValueError for "Row 0 has 3 values, but Table only has 2 columns." type errors in agate/table/__init__.py. We'd have to introduce a new error class (subclass'ing ValueError, in case anyone catches these). We'd also have to handle it all over the place, because we need access to the reader to print the dialect.

Debug output

This is a good idea. As above, we'd have to add it in a lot of places. Happy to merge a PR!

and, how about showing warning of excessing SNIFF_LIMIT?:

The snifflimit was reduced in 1.0.7 to avoid sniffing huge files (which is very slow). So, this warning would now be emitted too frequently to be useful.

jpmckinney avatar Oct 18 '23 02:10 jpmckinney