evidence icon indicating copy to clipboard operation
evidence copied to clipboard

Surface errors from CSV connector to sources output

Open archiewood opened this issue 1 year ago • 1 comments

Background

CSV files are notoriously hard to parse. Evidence uses DuckDB which is very good, but often fails without configuration.

For example, a failure may look like this

npm run sources

> [email protected] sources
> evidence sources

✔ Loading plugins & sources
-----
  [Processing] cdc
  deaths ✔ Finished, wrote 0 rows.

However, this is not easy to debug. If you drop into duckdb CLI and try from 'deaths.csv' you get a much more helpful, verbose output.

$ from 'deaths.csv';

Conversion Error: CSV Error on Line: 24473
Original Line: LA,2022,November,12 month-ending,Percent with drugs specified,68.9328389,99.5+,0.020997175,Louisiana,Numbers may differ from published reports using final data. See Technical Notes.,**,
Error when converting column "Percent Complete". Could not convert string "99.5+" to 'BIGINT'

Column Percent Complete is being converted as type BIGINT
This type was auto-detected from the CSV file.
Possible solutions:
* Override the type for this column manually by setting the type explicitly, e.g. types={'Percent Complete': 'VARCHAR'}
* Set the sample size to a larger value to enable the auto-detection to scan more values, e.g. sample_size=-1
* Use a COPY statement to automatically derive types from an existing table.

Solution

This Error message should be surfaced to the user

archiewood avatar Sep 19 '24 11:09 archiewood

It may be helpful to surface errors from other connectors. I am unsure about this

archiewood avatar Sep 19 '24 11:09 archiewood