duckdb icon indicating copy to clipboard operation
duckdb copied to clipboard

sniff_csv fails to detect compression when reading from a url that has a query string

Open gabihodoroaga opened this issue 1 year ago • 2 comments

What happens?

sniff_csv fails to detect compression when reading from a url that has a query string.

For example this statement

from sniff_csv('https://github.com/duckdb/duckdb/raw/main/data/csv/who.csv.gz?v=1');

Fails withs this error

Error: Invalid Input Error: Invalid unicode (byte sequence mismatch) detected in value construction

but this statement succeeds

from sniff_csv('https://github.com/duckdb/duckdb/raw/main/data/csv/who.csv.gz');

The root cause of this is in the file at line src/common/virtual_file_system.cpp at line 15 https://github.com/duckdb/duckdb/blob/4d24f5c660a205bf22a7fd99e36efece798452c4/src/common/virtual_file_system.cpp#L15 and could be fixed with a check for url and remove the query string path.

I will open a PR for this.

To Reproduce

Run this statement

from sniff_csv('https://github.com/duckdb/duckdb/raw/main/data/csv/who.csv.gz?v=1');

OS:

all

DuckDB Version:

0.9.3

DuckDB Client:

all

Full Name:

Gabriel Hodoroaga

Affiliation:

Bobsled

Have you tried this on the latest nightly build?

I have tested with a nightly build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • [X] Yes, I have

gabihodoroaga avatar Feb 13 '24 12:02 gabihodoroaga

I will open a PR for this.

Please don't do that today, we're trying to not put unnecessary pressure on the CI as we're looking to release later today

Tishj avatar Feb 13 '24 12:02 Tishj

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Jul 02 '24 00:07 github-actions[bot]

This issue was closed because it has been stale for 30 days with no activity.

github-actions[bot] avatar Aug 01 '24 00:08 github-actions[bot]