COPY TO S3 with invalid URL takes a very long time to fail
CrateDB version
5.10.10
CrateDB setup information
Single node and multi-node (see note about multi-node in the Actual Result section)
Problem description
Trying to export data to S3 with COPY TO it can happen that one can make a mistake on the URL, when this happens CrateDB takes a very long time to report it which leads to the impression it is working on the export making so an admin will only notice the mistake much later.
Steps to Reproduce
CREATE TABLE tbl1 (a integer);
INSERT INTO tbl1 SELECT 1;
REFRESH TABLE tbl1;
COPY tbl1 TO DIRECTORY 's3://invalid:[email protected]:80/nonexisting/hernantest' with (protocol = 'https', compression = 'gzip',format='json_object',wait_for_completion='true');
Actual Result
SdkClientException[Unable to execute HTTP request: Remote host terminated the handshake]
but After ~15 minutes (and it seem on multi-node clusters this delay time may be multiplied by the number of nodes)
Expected Result
The same message but within a minute or so.
Also a KILL on a COPY TO stuck as described above seems to take ~17 minutes.
For clarity, the mistake in the URL in this case is the port number, for protocol https it should have been 443
Not specifying the port number instead leads to a misleading AmazonS3Exception: Access Denied
It's possible to set an time-out on the aws s3 sdk. This could become part of the with clause e.g.:
COPY tbl1 TO DIRECTORY 's3://invalidurl...'with (timeout='60s');
However, we would need to check first if this is also applicable to all other copy-to variants.