iceberg-go icon indicating copy to clipboard operation
iceberg-go copied to clipboard

fix(cli/rest) Support Glue REST operations with Iceberg-Go CLI

Open maninc opened this issue 6 months ago • 4 comments

Motivation

To Support AWS Glue Iceberg endpoint using Iceberg-Go CLI. Forbidden error is thrown when Iceberg tables in glue are queried using CLI.

https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html

Fix:

  • Added RestCatalogConfig struct to CatalogConfig struct
  • RestCatalogConfig contains rest catalog configuration properties.
  • Above values are used before creating rest catalog
  • Use AWS environment credentials when Credentials property is not set

Testing:

  • Added unit test for new configuration
  • Used below rest_config.yaml to query Iceberg tables in AWS Glue.
catalog:
  default:
    type: rest
    uri: https://glue.us-east-1.amazonaws.com/iceberg
    region: us-east-1
    warehouse: YOUR_AWS_ACCOUNT_ID
    rest-config:
      sigv4-region: us-east-1
      sigv4-service: glue

maninc avatar Jun 12 '25 01:06 maninc

I'm doubtful about this approach. Can we pass this via flags? This makes the CLI less unified. I think it's better to keep this approach consistent — by using flags.

Thanks @laskoviymishka for reviewing !

Yes, that was my initial approach as well to make CLI args and config files to be as close as possible. Since we may end up in passing too many arguments, i used this approach.

Do you think adding these REST config arguments as a JSON would be easier ?

maninc avatar Jun 16 '25 17:06 maninc

I think this is still a problem. From what I see in other Glue clients, parameters are typically passed as CLI flags instead of a JSON file. CLI flags are more self-describing (you can view them directly with --help) and much more convenient to use.

Currently, using a custom JSON schema adds an unnecessary layer of complexity - it's less clear to the user and makes maintenance more cumbersome.

Could we consider following the pattern used by other tools in the ecosystem?

See for example Iceberg Catalog Migrator

java -jar iceberg-catalog-migrator-cli.jar migrate \
  --source-catalog-type GLUE \
  --source-catalog-properties warehouse=s3a://example-bucket/gluecatalog/,io-impl=org.apache.iceberg.aws.s3.S3FileIO \
  --target-catalog-type NESSIE \
  --target-catalog-properties uri=http://...,warehouse=s3a://...,io-impl=...

Another option is simply by-pass this as AWS_* env-vars, as it done in https://py.iceberg.apache.org/cli/

laskoviymishka avatar Jun 16 '25 18:06 laskoviymishka

I agree with @laskoviymishka that we shouldn't be passing custom JSON stuff and should instead use flags or just follow whatever pattern pyiceberg is using.

zeroshade avatar Jun 16 '25 21:06 zeroshade

Thank you @laskoviymishka @zeroshade for the suggestion. I have updated the code to follow the pattern in Iceberg Catalog Migrator.

Successfully tested cli using command below,

./iceberg list --catalog rest --uri https://glue.us-east-1.amazonaws.com/iceberg --warehouse <ACCOUNT_ID> --rest-config sigv4-region=us-east-1,sigv4-service=glue

maninc avatar Jul 07 '25 03:07 maninc