crate icon indicating copy to clipboard operation
crate copied to clipboard

COPY TO S3 with invalid URL takes a very long time to fail

Open hlcianfagna opened this issue 6 months ago • 3 comments

CrateDB version

5.10.10

CrateDB setup information

Single node and multi-node (see note about multi-node in the Actual Result section)

Problem description

Trying to export data to S3 with COPY TO it can happen that one can make a mistake on the URL, when this happens CrateDB takes a very long time to report it which leads to the impression it is working on the export making so an admin will only notice the mistake much later.

Steps to Reproduce

CREATE TABLE tbl1 (a integer);
INSERT INTO tbl1 SELECT 1;
REFRESH TABLE tbl1;
COPY tbl1 TO DIRECTORY 's3://invalid:[email protected]:80/nonexisting/hernantest' with (protocol = 'https', compression = 'gzip',format='json_object',wait_for_completion='true');

Actual Result

SdkClientException[Unable to execute HTTP request: Remote host terminated the handshake]

but After ~15 minutes (and it seem on multi-node clusters this delay time may be multiplied by the number of nodes)

Expected Result

The same message but within a minute or so.

hlcianfagna avatar Jul 04 '25 14:07 hlcianfagna

Also a KILL on a COPY TO stuck as described above seems to take ~17 minutes.

hlcianfagna avatar Jul 04 '25 15:07 hlcianfagna

For clarity, the mistake in the URL in this case is the port number, for protocol https it should have been 443 Not specifying the port number instead leads to a misleading AmazonS3Exception: Access Denied

hlcianfagna avatar Jul 04 '25 15:07 hlcianfagna

It's possible to set an time-out on the aws s3 sdk. This could become part of the with clause e.g.:

COPY tbl1 TO DIRECTORY 's3://invalidurl...'with (timeout='60s');

However, we would need to check first if this is also applicable to all other copy-to variants.

mkleen avatar Jul 08 '25 11:07 mkleen