starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

java.net.UnknownHostException during Cluster Snapshot upload

Open lobrandon1217 opened this issue 6 months ago • 4 comments

A java.net.UnknownHostException happens when the cluster snapshot is being uploaded to an internal S3-compliant service. The URL of the endpoint is being modified. In this example, the endpoint is "aws.s3.endpoint":"https://service.data.company.com" but the AWS SDK is complaining about the endpoint being starrocks.service.data.company.com

Steps to reproduce the behavior (Required)

  1. Create a storage volume (in this example, it is built-in)
describe storage volume builtin_storage_volume\G
*************************** 1. row ***************************
     Name: builtin_storage_volume
     Type: S3
IsDefault: true
 Location: s3://starrocks
   Params: {"aws.s3.access_key":"******","aws.s3.secret_key":"******","aws.s3.endpoint":"https://service.data.company.com","aws.s3.region":"","aws.s3.use_instance_profile":"false","aws.s3.use_aws_sdk_default_behavior":"false"}
  Enabled: true
  Comment: 
1 row in set (0.00 sec)
  1. Enable snapshots
ADMIN SET AUTOMATED CLUSTER SNAPSHOT ON

Expected behavior (Required)

Snapshot uploaded to S3 bucket.

Real behavior (Required)

mysql> SELECT * FROM information_schema.cluster_snapshot_jobs;
+------------------------------------------+----------+---------------------+---------------+-------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SNAPSHOT_NAME                            | JOB_ID   | CREATED_TIME        | FINISHED_TIME | STATE | DETAIL_INFO | ERROR_MESSAGE                                                                                                                                                                              |
+------------------------------------------+----------+---------------------+---------------+-------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| automated_cluster_snapshot_1750189655708 | 11828821 | 2025-06-17 19:47:35 | NULL          | ERROR |             | upload image failed, err msg: Failed to copy local /opt/starrocks/fe/meta/image to s3://starrocks/4670ab84-aaaf-4606-a370-b762c206b7fe/meta/image/automated_cluster_snapshot_1750189655708 |
...
java.net.UnknownHostException: getFileStatus on s3://starrocks/4670ab84-aaaf-4606-a370-b762c206b7fe/meta/image/automated_cluster_snapshot_1750191657846: 
software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. 
See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.:    software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. 
See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.: starrocks.service.data.company.com

StarRocks version (Required)

3.5.0-10d7323

lobrandon1217 avatar Jun 17 '25 21:06 lobrandon1217

@lobrandon1217 is this endpoint https://service.data.company.com an virtual host style endpoint for the bucket or an general endpoint for all the buckets?

kevincai avatar Jun 20 '25 01:06 kevincai

@lobrandon1217 is this endpoint https://service.data.company.com an virtual host style endpoint for the bucket or an general endpoint for all the buckets?

It is a general endpoint for all buckets

lobrandon1217 avatar Jun 20 '25 01:06 lobrandon1217

give a try adding aws.s3.enable_path_style_access = true in the storage volumes' params, see if it works.

kevincai avatar Jun 20 '25 01:06 kevincai

@kevincai Not sure if I am doing something wrong, but the setting does not work for me.

For testing, I created a new storage volume:

CREATE STORAGE VOLUME backups 
TYPE = S3 
LOCATIONS = ('s3://starrocks-dev-backups') 
PROPERTIES (
'enabled' = 'true', 
'aws.s3.endpoint' = 'https://service.data.company.com',
'aws.s3.use_instance_profile' = 'false', 
'aws.s3.use_aws_sdk_default_behavior' = 'false', 
'aws.s3.access_key' = '***', 
'aws.s3.secret_key' = '***', 
'aws.s3.enable_path_style_access' = 'true'
);

But the setting does not show up

describe storage volume backups;
+---------+------+-----------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+
| Name    | Type | IsDefault | Location                   | Params                                                                                                                                                                                                                           | Enabled | Comment |
+---------+------+-----------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+
| backups | S3   | false     | s3://starrocks-dev-backups | {"aws.s3.access_key":"******","aws.s3.secret_key":"******","aws.s3.endpoint":"https://service.data.company.com","aws.s3.region":"us-east-1","aws.s3.use_instance_profile":"false","aws.s3.use_aws_sdk_default_behavior":"false"} | true    |         |
+---------+------+-----------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+

And it still encounters the same error

ADMIN SET AUTOMATED CLUSTER SNAPSHOT ON STORAGE VOLUME backups;
SELECT * FROM information_schema.cluster_snapshot_jobs;

+------------------------------------------+---------+---------------------+---------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SNAPSHOT_NAME                            | JOB_ID  | CREATED_TIME        | FINISHED_TIME | STATE     | DETAIL_INFO | ERROR_MESSAGE                                                                                                                                                                                          |
+------------------------------------------+---------+---------------------+---------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| automated_cluster_snapshot_1750386647443 | 1265993 | 2025-06-20 02:30:47 | NULL          | ERROR     |             | upload image failed, err msg: Failed to copy local /opt/starrocks/fe/meta/image to s3://starrocks-dev-backups/eea438b3-1321-45b2-862b-1bb4b9edcf12/meta/image/automated_cluster_snapshot_1750386647443 |
+------------------------------------------+---------+---------------------+---------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Log file snippet:

java.net.UnknownHostException: getFileStatus on s3://starrocks-dev-backups/eea438b3-1321-45b2-862b-1bb4b9edcf12/meta/image/automated_cluster_snapshot_1750386647443: software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.:    software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.: starrocks-dev-backups.service.data.company.com

lobrandon1217 avatar Jun 20 '25 02:06 lobrandon1217

Hi @lobrandon1217 , I am having the same issue, did you find anything?

@kevincai Not sure if I am doing something wrong, but the setting does not work for me.

For testing, I created a new storage volume:

CREATE STORAGE VOLUME backups 
TYPE = S3 
LOCATIONS = ('s3://starrocks-dev-backups') 
PROPERTIES (
'enabled' = 'true', 
'aws.s3.endpoint' = 'https://service.data.company.com',
'aws.s3.use_instance_profile' = 'false', 
'aws.s3.use_aws_sdk_default_behavior' = 'false', 
'aws.s3.access_key' = '***', 
'aws.s3.secret_key' = '***', 
'aws.s3.enable_path_style_access' = 'true'
);

But the setting does not show up

describe storage volume backups;
+---------+------+-----------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+
| Name    | Type | IsDefault | Location                   | Params                                                                                                                                                                                                                           | Enabled | Comment |
+---------+------+-----------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+
| backups | S3   | false     | s3://starrocks-dev-backups | {"aws.s3.access_key":"******","aws.s3.secret_key":"******","aws.s3.endpoint":"https://service.data.company.com","aws.s3.region":"us-east-1","aws.s3.use_instance_profile":"false","aws.s3.use_aws_sdk_default_behavior":"false"} | true    |         |
+---------+------+-----------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+

And it still encounters the same error

ADMIN SET AUTOMATED CLUSTER SNAPSHOT ON STORAGE VOLUME backups;
SELECT * FROM information_schema.cluster_snapshot_jobs;

+------------------------------------------+---------+---------------------+---------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SNAPSHOT_NAME                            | JOB_ID  | CREATED_TIME        | FINISHED_TIME | STATE     | DETAIL_INFO | ERROR_MESSAGE                                                                                                                                                                                          |
+------------------------------------------+---------+---------------------+---------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| automated_cluster_snapshot_1750386647443 | 1265993 | 2025-06-20 02:30:47 | NULL          | ERROR     |             | upload image failed, err msg: Failed to copy local /opt/starrocks/fe/meta/image to s3://starrocks-dev-backups/eea438b3-1321-45b2-862b-1bb4b9edcf12/meta/image/automated_cluster_snapshot_1750386647443 |
+------------------------------------------+---------+---------------------+---------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Log file snippet:

java.net.UnknownHostException: getFileStatus on s3://starrocks-dev-backups/eea438b3-1321-45b2-862b-1bb4b9edcf12/meta/image/automated_cluster_snapshot_1750386647443: software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.:    software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.: starrocks-dev-backups.service.data.company.com

hasanozciftcii avatar Jul 04 '25 13:07 hasanozciftcii

will be fixed in #62591

kevincai avatar Sep 02 '25 13:09 kevincai