iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Can't use AWS S3Tables with iceberg-rest-catalog

Open mildbyte opened this issue 7 months ago • 7 comments

I'm trying to configure the REST catalog as follows to read an AWS S3 Tables catalog:

use std::sync::Arc;

use aws_config::Region;
use iceberg_rust::{
    catalog::Catalog, object_store::ObjectStoreBuilder, spec::namespace::Namespace,
};

use iceberg_rest_catalog::apis::configuration::{AWSv4Key, Configuration};

async fn try_rest_catalog() {
    let configuration = Configuration {
        base_path: "https://s3tables.eu-west-2.amazonaws.com/iceberg".to_string(),
        aws_v4_key: Some(AWSv4Key {
            access_key: "ASIA-REDACTED".to_string(),
            secret_key: "REDACTED"
                .to_string()
                .into(),
            security_token: Some("REDACTED".to_string().into()),
            region: "eu-west-2".to_string(),
            service: "s3tables".to_string(),
        }),
        ..Configuration::default()
    };

    let builder = ObjectStoreBuilder::s3();

    let catalog = iceberg_rest_catalog::catalog::RestCatalog::new(
        Some("arn:aws:s3tables:eu-west-2:REDACTED:bucket/REDACTED"),
        configuration,
        builder,
    );

    let namespaces = catalog.list_namespaces(None).await.unwrap();
    for namespace in namespaces.iter() {
        println!("found namespace {}", namespace);
    }

    let namespace_to_list = Namespace::try_new(&vec!["public".to_string()]).unwrap();
    let tables = catalog.list_tabulars(&namespace_to_list).await.unwrap();

    for table in tables.iter() {
        println!("table {}", table.name())
    }
}

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();
    try_rest_catalog().await;
}

The security token is my session token that I got via aws configure export-credentials - without it, I was getting this error:

thread 'main' panicked at src/main.rs:64:58:
called `Result::unwrap()` on an `Err` value: InvalidFormat("Response status: 403 Forbidden, Response content: {\"message\":\"The security token included in the request is invalid.\"}")

I also applied this patch to iceberg-rest-catalog to get it to send the security token to AWS:

diff --git a/catalogs/iceberg-rest-catalog/src/apis/configuration.rs b/catalogs/iceberg-rest-catalog/src/apis/configuration.rs
index 003e506..a29b4b1 100644
--- a/catalogs/iceberg-rest-catalog/src/apis/configuration.rs
+++ b/catalogs/iceberg-rest-catalog/src/apis/configuration.rs
@@ -45,6 +45,7 @@ pub struct ApiKey {
 pub struct AWSv4Key {
     pub access_key: String,
     pub secret_key: SecretString,
+    pub security_token: Option<SecretString>,
     pub region: String,
     pub service: String,
 }
@@ -62,11 +63,18 @@ impl AWSv4Key {
             .body(body)
             .unwrap();
         let signing_settings = SigningSettings::default();
-        let signing_params = SigningParams::builder()
+
+        let mut builder = SigningParams::builder()
             .access_key(self.access_key.as_str())
             .secret_key(self.secret_key.expose_secret().as_str())
             .region(self.region.as_str())
-            .service_name(self.service.as_str())
+            .service_name(self.service.as_str());
+
+        if let Some(security_token) = &self.security_token {
+            builder = builder.security_token(security_token.expose_secret().as_str())
+        };
+
+        let signing_params = builder
             .time(SystemTime::now())
             .settings(signing_settings)
             .build()

However, now I'm getting a different error:

thread 'main' panicked at src/main.rs:63:58:
called `Result::unwrap()` on an `Err` value: InvalidFormat("Response status: 403 Forbidden, Response content: {\"message\":\"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\n\nThe Canonical
 String for this request should have been\n'GET\n/iceberg/v1/arn%253Aaws%253As3tables%253Aeu-west-2%253AREDACTED%253Abucket%252FREDACTED/namespaces\n\nhost:s3tables.eu-west-2.amazonaws.com\nx-amz-date:20250407T134321Z\nx-amz-security-token:REDACTED\n\nhost;x-amz-date;x-amz-security-token\ne3b0REDACTED'\n\nThe String-to-Sign should have been\n'AWS4-HMAC-SHA256\n20250407T134321Z\n20250407/eu-west-2/s3tables/aws4_request\n9d3bb64d8f7f42b9b0e7dfd04e4fc448b4008416e770b4c2c7c1bc547b626bf2'\n\"}")

Interestingly, it's using two x-amz-security-token headers, one with my session token and one with a random hex string.

I can successfully read the catalog with the s3-tables catalog:

async fn try_s3tables_catalog() {
    let builder = ObjectStoreBuilder::s3();
    let sdk = aws_config::load_defaults(aws_config::BehaviorVersion::latest())
        .await
        .into_builder()
        .region(Region::new("eu-west-2"))
        .build();

    let catalog: Arc<dyn Catalog> = Arc::new(
        iceberg_s3tables_catalog::S3TablesCatalog::new(
            &sdk,
            "arn:aws:s3tables:eu-west-2:REDACTED:bucket/REDACTED",
            builder,
        )
        .unwrap(),
    );

    let namespaces = catalog.list_namespaces(None).await.unwrap();
    for namespace in namespaces.iter() {
        println!("found namespace {}", namespace);
    }

    let namespace_to_list = Namespace::try_new(&vec!["public".to_string()]).unwrap();
    let tables = catalog.list_tabulars(&namespace_to_list).await.unwrap();

    for table in tables.iter() {
        println!("table {}", table.name())
    }
}

Is using iceberg-rest-catalog supported to access AWS S3 Tables or am I only supposed to use the separate iceberg-s3-tables-catalog?

mildbyte avatar Apr 07 '25 13:04 mildbyte

I'm looking into it. I didn't know that you could access the s3tables with the REST catalog. Generally the current design of the RestCatalog might not be ideal as the name of the catalog would be equal to the table bucket arn. It might be good to have some kind of name mapping. (One issue I saw in your code is that I think you have to urlencode the table bucket arn).

Here is the code that I tested that worked with the Glue Iceberg REST endpoint: https://github.com/JanKaul/frostbow/blob/main/frostbow/src/main.rs#L75

I'm gonna get back to you once I've tested the endpoint myself.

JanKaul avatar Apr 08 '25 12:04 JanKaul

Currently the prefix is not automatically url encoded. In your code you define the catalog as not url-encoded but aws exprects it as url-encoded. This could be one issue. I think it would make sense to automatically url-encode every catalog prefix.

JanKaul avatar Apr 08 '25 13:04 JanKaul

I think the problem is url-encoding the catalog prefix. I have a fix on this branch.

When I use that, I get some permission problem but I didn't want to go into setting up the right AWS permissions. But it looks like the signature is correct.

JanKaul avatar Apr 09 '25 10:04 JanKaul

When I use that, I get some permission problem but I didn't want to go into setting up the right AWS permissions. But it looks like the signature is correct.

Thanks! I have the permissions set up in my environment, let me test it out and get back to you

mildbyte avatar Apr 09 '25 12:04 mildbyte

I think we're close but it's now double-URL-encoding the endpoint?

($ACCOUNT_ID, $BUCKET_NAME ... is me redacting the logs, not the values I'm sending)

{"message":"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

The Canonical String for this request should have been
'GET
/iceberg/v1/arn%25253Aaws%25253As3tables%25253Aeu-west-2%25253A$ACCOUNT_ID%25253Abucket%25252F$BUCKET_NAME/namespaces

host:s3tables.eu-west-2.amazonaws.com
x-amz-date:20250409T133139Z
x-amz-security-token:$MY_SESSION_TOKEN

host;x-amz-date;x-amz-security-token
e3b0c$REDACTED

The String-to-Sign should have been
'AWS4-HMAC-SHA256
20250409T133139Z
20250409/eu-west-2/s3tables/aws4_request
1b0dec47d0a25dbb0b92230fbd6b3b90f82ffd7a0105dba5f079bc8033dadef5'
"}
async fn try_rest_catalog() {
    let configuration = Configuration {
        base_path: "https://s3tables.eu-west-2.amazonaws.com/iceberg".to_string(),
        aws_v4_key: Some(AWSv4Key {
            access_key: "$MY_ACCESS_KEY".to_string(),
            secret_key: "$MY_SECRET_KEY"
                .to_string()
                .into(),
            session_token: Some("$MY_SESSION_TOKEN".to_string().into()),
            region: "eu-west-2".to_string(),
            service: "s3tables".to_string(),
        }),
        ..Configuration::default()
    };

    let builder = ObjectStoreBuilder::s3();

    let catalog = iceberg_rest_catalog::catalog::RestCatalog::new(
        Some("arn:aws:s3tables:eu-west-2:$ACC_ID:bucket/$BUCKET_NAME"),
        configuration,
        Some(builder),
    );

    let namespaces = catalog.list_namespaces(None).await.unwrap();
    for namespace in namespaces.iter() {
        println!("found namespace {}", namespace);
    }

    let namespace_to_list = Namespace::try_new(&vec!["public".to_string()]).unwrap();
    let tables = catalog.list_tabulars(&namespace_to_list).await.unwrap();

    for table in tables.iter() {
        println!("table {}", table.name())
    }
}
$ git status
On branch rest-url-encode
Your branch is up to date with 'upstream/rest-url-encode'.

nothing to commit, working tree clean
$ git rev-parse HEAD
5d9f3e9c4bd33f2c365c46fd30a958cd3697eeda

This is with iceberg-rest-catalog using aws-sigv4 0.3.1.

mildbyte avatar Apr 09 '25 14:04 mildbyte

You're right, I got carried away. With the last change we're url-encoding it twice. (It's good that we tried it though because I realized we ware also url-encoding the namespaces twice).

It seems to be a problem with the list-namespaces endpoint. If I'm calling the create-namespace method, I don't get a signature error and the namespace is created correctly.

I don't really have an idea about why this is happening. It could somehow be related to the query-parameters, but we're actually not using any query parameters. I just saw this issue for the openapi-generator: https://github.com/OpenAPITools/openapi-generator/issues/12614.

Generally the aws-sigv4 dependency is pretty outdated. I just wanted to stay close the openapi-generator to reduce the maintenance burden.

JanKaul avatar Apr 10 '25 13:04 JanKaul

Interesting, thanks for investigating. We'll go with the dedicated iceberg-s3tables-catalog crate in that case, since it also supports other SDK auth methods.

mildbyte avatar Apr 23 '25 07:04 mildbyte