datacube-core icon indicating copy to clipboard operation
datacube-core copied to clipboard

Configuration behaviour is confusing and poorly documented.

Open SpacemanPaul opened this issue 2 years ago • 20 comments

Expected behaviour

The behaviour of the $DATACUBE_DB_URL environment variable is clearly documented.

Actual behaviour

The $DATACUBE_DB_URL environment variable is not mentioned at all in the docs.

SpacemanPaul avatar May 19 '22 06:05 SpacemanPaul

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 20 '22 20:09 stale[bot]

Lets expand the scope of this to cover documenting how ODC finds it's configuration and determines which Database to connect to.

The only existing documentation is Installing & Managing -> Configuring -> Database Setup -> Create Configuration File

I would expect to find information in the Data Access ... -> Connecting to ODC, but this page has a non-link to Setting up your environment. The link needs fixing, if that page even exists.

I think this should also be documented in the API docs for datacube.Datacube.

omad avatar Nov 02 '22 00:11 omad

The current behaviour is complicated and confusing, involving many possible environment variables and configuration files, on top of flags to CLI applications and arguments passed to datacube.Datacube() when using the Python API.

The precedence of configuration files is in https://github.com/opendatacube/datacube-core/blob/develop/datacube/config.py#L19-L30 , and the logic in the rest of config.py determines how they're used.

Configuration file precedence

  1. /etc/datacube.conf
  2. File referenced by the $DATACUBE_CONFIG_PATH environment variable.
  3. ~/.datacube.conf
  4. datacube.conf in the current working directory ($PWD)

Environment Variables DATACUBE_CONFIG_PATH DATACUBE_ENVIRONMENT DATACUBE_IAM_AUTHENTICATION DATACUBE_DB_URL DATACUBE_IAM_TIMEOUT DB_HOSTNAME, DB_USERNAME, DB_PASSWORD, DB_DATABASE

These are on top of all the PostgreSQL configuration options, including ~/.pgpass, PGUSER, PGDATABASE, PGHOST, PGPORT, PGPASSWORD

omad avatar Nov 02 '22 00:11 omad

I've found an "interesting" undocumented feature lurking in the ODC configuration code. If you run python -m datacube (or call the function datacube.config.auto_config(), a new configuration file will be written to ~/.datacube.conf based upon current environment variables. 🤯

There are a bunch of tests on how the configuration settings work in tests/test_config.py, but they don't include enough commentary to be useful as documentation.

omad avatar Nov 02 '22 00:11 omad

The documentation for the configuration file needs improving too. The canonical source in the docs seems to be Database Setup -> Create Configuration File.

It is mostly made up of a commented example configuration file, but doesn't mention db_port, or any of the AWS specific authentication options which are supported. The section of code responsible appears to be datacube/drivers/postgres/_connections.py

omad avatar Nov 02 '22 01:11 omad

Oh my, there's more complexity in choosing the "Datacube Environment" to use.

The priority for choosing the environment is (to the best of my understanding):

  1. The env= value passed as an argument when creating Datacube().
  2. The environment variable DATACUBE_ENVIRONMENT
  3. An undocumented value stored in the configuration file/s.
[user]
default_environment=value
  1. default
  2. datacube

(See LocalConfig.__init__() in datacube/config.py)

Also: I don't think "Datacube Environment" is defined clearly anywhere.

omad avatar Nov 02 '22 03:11 omad

Okay, after thinking I understood this now, and tearing my hair out.... there's more.

The above logic is only called when creating a datacube.Datacube() in code.

If you're using the datacube CLI tools, they use completely different logic, see datacube.config.LocalConfig.find().

It respects the environment variables DATACUBE_DB_URL, DB_[USERNAME|HOSTNAME|PORT|DATABASE|PASSWORD], but doesn't support DATACUBE_CONFIG_PATH or DATACUBE_ENVIRONMENT!! Those must be passed as command line flags to the datacube command.

omad avatar Nov 02 '22 05:11 omad

Oh wow, I've been through this code fairly recently but didn't spot all of these gotchas.

SpacemanPaul avatar Nov 03 '22 01:11 SpacemanPaul

See also @Kirill888 's comments on #1329 re: using postgres with conda.

SpacemanPaul avatar Nov 09 '22 22:11 SpacemanPaul

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 18 '23 04:03 stale[bot]

Configuration file precedence

  1. /etc/datacube.conf
  2. File referenced by the $DATACUBE_CONFIG_PATH environment variable.
  3. ~/.datacube.conf
  4. datacube.conf in the current working directory ($PWD)

Note this list is in reverse precedence order - the ones lower in the list override the one's higher in the list. This means that the priority of the environment variable is surprisingly low - should really be the highest precedence.

SpacemanPaul avatar Apr 12 '23 01:04 SpacemanPaul

the ones lower in the list override the one's higher in the list.

@SpacemanPaul can you confirm what granularity of the override is?

  • Whole file
  • Environment section
  • Individual settings

Is it fusing multiple configs "somehow" or just picks last found one?

Kirill888 avatar Apr 12 '23 01:04 Kirill888

Also I'm guessing that $DATACUBE_CONFIG_PATH is really meant to be:

"alternative to global config that is usually in /etc/datacube.conf, but we can't have that on something like NCI, so we add environment variable for that."

Kirill888 avatar Apr 12 '23 03:04 Kirill888

the ones lower in the list override the one's higher in the list.

@SpacemanPaul can you confirm what granularity of the override is?

* Whole file

* Environment section

* Individual settings

Is it fusing multiple configs "somehow" or just picks last found one?

I'm just looking starting to look at that now - but on first glance, yes, it looks like it might be fusing them (and there's even yet another base default config not mentioned in this list that is defined statically in the source code). Immediate progress will be slow because I'm about to go on leave but cleaning this up for 1.9 is my next priority - I'll clean up the 1.8 documentation at the same time.

SpacemanPaul avatar Apr 12 '23 04:04 SpacemanPaul

The above logic is only called when creating a datacube.Datacube() in code.

If you're using the datacube CLI tools, they use completely different logic, see datacube.config.LocalConfig.find().

Looks to me like Datacube() calls LocalConfig.find as well - see the normalise_config function in Datacube.__init__()?

SpacemanPaul avatar Apr 12 '23 05:04 SpacemanPaul

Then there's datacube.ui.task_app which contains a bunch of utilities and config options that do not appear to be actually used anywhere in core, or in odc-tools. It's not clear to me what other repos might be using these functions but gee it would be nice if I could just delete the whole file.

I think it's all tied up with the executer/worker framework which iirc we had already said we were going to remove. (We already partly removed celery support).

SpacemanPaul avatar Apr 13 '23 04:04 SpacemanPaul

Honestly, the behaviour in 1.8 is so complex, conditional and inconsistent, I'm not sure it's even possible to write complete, accurate and comprehensible documentation.

SpacemanPaul avatar Apr 20 '23 03:04 SpacemanPaul

This threads sums up nicely our "understanding" - we use everything but the IAM stuff at the moment, but that's because it's completely undocumented and I've not been brave enough to dig into the rabbit hole....

woodcockr avatar Apr 21 '23 00:04 woodcockr

Discussion continues: https://github.com/opendatacube/datacube-core/wiki/ODC-EP-010---Replace-Configuration-Layer

omad avatar Apr 21 '23 05:04 omad

EP-10 ^^ is now mostly complete design. Thanks to Damien and Rob for their feedback while working through this.

I'm pretty sure the new design provides a way to do everything that current system can do except merging multiple files - and as I discuss, with the way I have the config loading process laid out, I think it would be possible to do multiple file merging with only minor tweaks if we decide we can't do without it - but I think some of the other new features I've introduced make it unnecessary.

SpacemanPaul avatar Apr 24 '23 01:04 SpacemanPaul