iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

improve pyiceberg CLI

Open kevinjqliu opened this issue 10 months ago • 5 comments

Feature Request / Improvement

Based on issues described in #1771

  1. We'd want to make it clear that the default catalog is used by default when no --catalog parameter is given. For example, pyiceberg list uses the default entry in the .pyiceberg.yaml file

  2. We should fix the order of the parameter passed into the CLI. For example, pyiceberg list --catalog hive does not override the catalog but pyiceberg --catalog hive list does.

kevinjqliu avatar Mar 11 '25 17:03 kevinjqliu

Hi, I can work on this issue. Could you assign the issue to me?

iting0321 avatar Mar 13 '25 14:03 iting0321

sure @iting0321 happy to help review :)

kevinjqliu avatar Mar 14 '25 23:03 kevinjqliu

Hi, I have some questions.
If the command is pyiceberg list, I need to read the default entry in the catalog. However, what if default is not set in the catalog?

Additionally, if the command is pyiceberg list --catalog hive, should I simply return a command order error, or should I read the default catalog and return the result as if the command were pyiceberg list at the same time?


Also, I would like to know whether you can provide an example of .pyiceberg.yaml that I can test locally. I am a bit confused about the content of .pyiceberg.yaml. For example, can we set the same uri prefix for both hive and default?

catalog:
  hive:
    uri: thrift://localhost:9083
    s3.endpoint: http://localhost:9100
    s3.access-key-id: admin
    s3.secret-access-key: adminadmin
    s3.region: us-east-1
  default:
    uri: thrift://default-catalog:9083

iting0321 avatar Mar 16 '25 14:03 iting0321

@iting0321 heres the current documentation for the CLI https://py.iceberg.apache.org/cli/

In general, the CLI requires a connection to the catalog. This can be done by passing the catalog configs via parameters, such as pyiceberg --uri ... list or by reading from the config file (~/.pyiceberg.yaml). By default, the CLI will read the default entry in the config file. To read other entries, you can use pyiceberg --catalog foo list

However, what if default is not set in the catalog?

this should error because the CLI cannot connect to any catalog

if the command is pyiceberg list --catalog hive

it would be nice to not enforce the order of the parameters. I think pyiceberg list --catalog hive should work the same as pyiceberg --catalog hive list

Also, I would like to know whether you can provide an example of .pyiceberg.yaml that I can test locally. I am a bit confused about the content of .pyiceberg.yaml. For example, can we set the same uri prefix for both hive and default?

your example looks correct. You can set the same uri if you like. The hive and default are just names you give to the specific configs. You can call it whatever you want as long as you refer to it in the CLI command

kevinjqliu avatar Mar 17 '25 17:03 kevinjqliu

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Nov 15 '25 00:11 github-actions[bot]