databend icon indicating copy to clipboard operation
databend copied to clipboard

bug: compression=auto does not work when the file has no extension.

Open wubx opened this issue 3 months ago • 1 comments

Search before asking

  • [x] I had searched in the issues and found no similar issues.

Version

main

What's Wrong?

create file format mygzcsv     TYPE = CSV,
    RECORD_DELIMITER = '\n',
    FIELD_DELIMITER = ',',
    COMPRESSION = AUTO,
    SKIP_HEADER = 1;

select $1,$2,$3,$4,$5,$6,$7,$8 from @mystage/gzcsv/01csvgz (file_format=>'mygzcsv');
error: APIError: QueryFailed: [1046]Invalid value '�0D�~���iC���"x�B���!`�b��z���<fw�59o��z����f��o����>�m��i���$����������d�Jj�E��^AWG�' for column 0 (_$1 String NULL): Invalid Utf8, cause: invalid utf-8 sequence of 1 bytes from index 0
at file 'gzcsv/01csvgz', line 1

create or replace file format mygzcsv     TYPE = CSV,
    RECORD_DELIMITER = '\n',
    FIELD_DELIMITER = ',',
    COMPRESSION = gzip,
    SKIP_HEADER = 1;

select $1,$2,$3,$4,$5,$6,$7,$8 from @mystage/gzcsv/01csv.gz (file_format=>'mygzcsv')
work is ok.

How to Reproduce?

any csv without extension name

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

wubx avatar Sep 26 '25 09:09 wubx

Although suffix matching can be used for implementation. But I still want to discuss whether this fix is reasonable? After all, the lack of an extension means that it is not a normal file. Shouldn't we let the program guess based on the suffix? @wubx

camilesing avatar Oct 25 '25 14:10 camilesing