duckdb
duckdb copied to clipboard
sniff_csv fails to detect compression when reading from a url that has a query string
What happens?
sniff_csv
fails to detect compression when reading from a url that has a query string.
For example this statement
from sniff_csv('https://github.com/duckdb/duckdb/raw/main/data/csv/who.csv.gz?v=1');
Fails withs this error
Error: Invalid Input Error: Invalid unicode (byte sequence mismatch) detected in value construction
but this statement succeeds
from sniff_csv('https://github.com/duckdb/duckdb/raw/main/data/csv/who.csv.gz');
The root cause of this is in the file at line src/common/virtual_file_system.cpp at line 15 https://github.com/duckdb/duckdb/blob/4d24f5c660a205bf22a7fd99e36efece798452c4/src/common/virtual_file_system.cpp#L15 and could be fixed with a check for url and remove the query string path.
I will open a PR for this.
To Reproduce
Run this statement
from sniff_csv('https://github.com/duckdb/duckdb/raw/main/data/csv/who.csv.gz?v=1');
OS:
all
DuckDB Version:
0.9.3
DuckDB Client:
all
Full Name:
Gabriel Hodoroaga
Affiliation:
Bobsled
Have you tried this on the latest nightly build?
I have tested with a nightly build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- [X] Yes, I have
I will open a PR for this.
Please don't do that today, we're trying to not put unnecessary pressure on the CI as we're looking to release later today
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue was closed because it has been stale for 30 days with no activity.