iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Python: Add configuration

Open Fokko opened this issue 3 years ago • 0 comments

This PR will add the option to read from a configuration file and environment variables. This can be used to read the catalog configuration from a file instead of having to pass it through the CLI or Python.

Looked at different Python projects:

  • https://docs.dask.org/en/stable/configuration.html
  • https://nvidia.github.io/spark-rapids/docs/configs.html
  • https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html
  • https://docs.python.org/3/library/configparser.html

Most of them have (a variation) of dot notation config. Also looks that plugins of Dask can also access the configuration, which is nice 👍🏻

Python itself comes with a configparser: https://docs.python.org/3/library/configparser.html

But this one uses sections, that isn't compatible with the dot notation config pyiceberg.catalog.uri=thrift://. Also, the Java implementation uses dotted config, so that's also nice.

Python ships out of the box with a yaml parser. The current implementation looks like this:


catalog:
   production:
      uri: thrift://prod:9083
   rest-dev:
      uri: http://server.io
      credential: sometoken

You can also override the config using environment variables:

PYICEBERG__CATALOG_PRODUCTION_URI=thrift://dev:9083

Fokko avatar Aug 10 '22 11:08 Fokko