rill icon indicating copy to clipboard operation
rill copied to clipboard

Improve error messages to include file name having ingestion issues

Open nishantmonu51 opened this issue 1 year ago • 2 comments

If a malformed CSV file gets added to a directory, it can fail data ingestion. In such case the error currently doesn't include the exact file name causing issues. Add the corrupt file name to the error returned. Optionally also add the ability to skip corrupted files as well.

nishantmonu51 avatar Aug 07 '24 06:08 nishantmonu51

If this is caused by DuckDB not showing the file name, we should consider just raising the issue in their issue tracker instead.

begelundmuller avatar Aug 19 '24 13:08 begelundmuller

I see the error message from duckDB is very helpful on main and not on 1.0.0 Error msg from main :

Conversion Error: CSV Error on Line: 1670846
Original Line: B00310,HELLO,2022-07-20 07:02:30,,242.0,,B03404
Error when converting column "pickup_datetime". Could not convert string "HELLO" to 'TIMESTAMP'

Column pickup_datetime is being converted as type TIMESTAMP
This type was auto-detected from the CSV file.
Possible solutions:
* Override the type for this column manually by setting the type explicitly, e.g. types={'pickup_datetime': 'VARCHAR'}
* Set the sample size to a larger value to enable the auto-detection to scan more values, e.g. sample_size=-1
* Use a COPY statement to automatically derive types from an existing table.

  file=data_22.csv
  delimiter = , (Auto-Detected)
  quote = " (Auto-Detected)
  escape = " (Auto-Detected)
  new_line = \n (Auto-Detected)
  header = true (Auto-Detected)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding=0
  sample_size=20480
  ignore_errors=false
  all_varchar=0

I will pick this once duckdb 1.1.0 is release which is scheduled to release on 2024-09-02

k-anshul avatar Aug 20 '24 07:08 k-anshul

Nothing to be done on this from our side. This will already be part of error messages. Sample below : However the file name is trimmed. image

k-anshul avatar Sep 04 '24 10:09 k-anshul

The file name being trimmed will be handled in a separate issue in https://github.com/rilldata/rill/issues/5604

k-anshul avatar Sep 05 '24 05:09 k-anshul