team icon indicating copy to clipboard operation
team copied to clipboard

Config file management

Open spacekookie opened this issue 7 years ago • 81 comments

(moderated summary by WG)

Context

Plan of Action

In-ecosystem resources

External inspiration

Challenges

  • Cross platform paths in config files. See also #10
  • Integrating information from environment, command line arguments, defaults, and cascading config files
  • Every existing applications is probably littered with special cases, migration schemes and special rules how settings are handled. I'm not sure it is possible to encode all this into an API without making it extremely hard to use.
  • Good error messages, see https://github.com/serde-rs/serde/issues/1184

@spacekookie's original poist

In the first meeting, we discussed the task of file management on different platforms (particularly configurations) and how it can be made better.

@Eijebong summarised it like this

If I need a config file, I don't want to know that it should be ${XDG_CONFIG:${XDG_HOME}/.config:/home/${user}/.config on linux, %AppDir%/App/ on windows and something else on osx [...]

There is a crate for "determing system configuration" (app-dirs-rs) but it seems unmaintained and not up to date

spacekookie avatar Feb 20 '18 18:02 spacekookie

The configure crate by @withoutboats abstract about configuration in general (as the name suggests), but doesn't seem to have an adapter for configuration files right now (except for Cargo.toml, but that's not the use case we have in mind I guess).

killercup avatar Feb 20 '18 19:02 killercup

I think it'd good to lay out some requirements for a crate(s) that would fill this gap:

  • Accepts a file name to either load/save
  • Abstracts away platform specific paths (i.e. as the application developer I only want to say, "myapp.conf" and have the crate handle where and how "myapp.conf" gets loaded/saved

I'd be OK with having multiple sub crates for various platforms, and then a "parent" crate that allows abstracting over the platform where all I do is specify a file name I want to load/save.

kbknapp avatar Feb 20 '18 19:02 kbknapp

Bonus if the crate can allow me to search custom directories, or tweak the order (on platforms where applicable).

kbknapp avatar Feb 20 '18 19:02 kbknapp

but doesn't seem to have an adapter for configuration files right now (except for Cargo.toml, but that's not the use case we have in mind I guess).

Nope, the configure crate's default "source" is definitely designed for use cases where the person configuring the application is also the author - such as network services. However, the intent is for libraries to use configure, so that the application author can have total control over the source of configuration.

A configuration source that integrates with configure and is designed for CLIs would be a great addition, and possibly one I'd be interested in upstreaming into configure proper.

withoutboats avatar Feb 20 '18 20:02 withoutboats

I'm thinking this issue might be part of a bigger topic.

  • How do we store user-created configuration files cross-platform? (So far this issue has only touched on this one specifically. E.g. Linux $HOME/.config)
  • Where do we store temporary files cross-platform? (Does not need to persist between boots, e.g. Linux /tmp.).
  • How do we store essential user data files cross-platform? (Stuff like login tokens, usually generated by applications. E.g. $HOME/.local on Linux.)
  • How do we store non-essential data files cross-platform? (Needs to persist between boots, e.g. Linux $HOME/.cache.)

It's probably uncommon for a single application to use all of these, but one or more should be common enough. It feels like these questions are part of the same problem; perhaps it might be useful to consider all of these questions as part of this discussion?

yoshuawuyts avatar Feb 21 '18 12:02 yoshuawuyts

For the question of temporary files there is already a crate which seems to do its job quite nicely (though I've only used it in limited scenarios so far, maybe it can be improved!)

As for the rest…I think it would be pretty cool if we could create (or find and improve existing) crates that mirror the same behaviour for other configuration, essential and non-essential data files as well.

It should be as simple as saying Configdir::new("my_app_name") and being able to write and read configurations from it.

Edit Just as I hit "Comment" I found this crate here

spacekookie avatar Feb 21 '18 12:02 spacekookie

Hi, @soc! The Rust CLI working group is talking about cross-platform configuration file management and your directories crate has come up. Looking at your Github profile, I see you have a Java directories package as well, so you seem have some expertise in this area. Wanna chime in here? :)

killercup avatar Feb 21 '18 13:02 killercup

@killercup Sure, how can I help?

soc avatar Feb 21 '18 21:02 soc

@soc awesome! We were currently doing some research about the status quo of crates that are useful when writing CLI tools, work cross-platform and are maintained. For example, we want to come up with a good story around how to easily configure a CLI tool—with config files, env vars, and CLI flags. This issue is focussing on the handling of config files. @kbknapp already listed some good requirements in https://github.com/rust-lang-nursery/cli-wg/issues/7#issuecomment-367085114.

Do you think directories is a good foundation here? What are your plans for it? Can we help you get it to 1.0? :)

(@spacekookie and @yoshuawuyts probably have more to say!)

killercup avatar Feb 21 '18 21:02 killercup

For example, we want to come up with a good story around how to easily configure a CLI tool—with config files, env vars, and CLI flags.

directories is intentionally focused solely on dealing with operating system defaults. The reasoning for this is not because I believe that other venues for configuration are not important, but to provide the most minimal, focused and stable API I can get away with.

For instance, when dealing with CLI flags, the first issue you have is that of style (-h and --help vs. -help; -xyz vs -x, -y, -z; key=value vs. key value; and that's just Linux/macOS ... Windows has its own, different rules with /h etc.). There is potentially a lot of complexity and moving parts involved when trying to provide an CLI interface that makes everyone happy.

Do you think directories is a good foundation here?

I do think that directories is a good foundation for dealing with the operating system standards part of your goals.

I believe that dealing with CLI flags should probably be done in a separate library, or in a way more specific to the individual application's needs, because dealing with CLI flags is very application-specific.

In the end individual applications already need to have some custom code anyway to deal with migrating from storing their data directly in $HOME to following the platform standards. Dealing with CLI flags will probably be the same.

That's why directories only tells developers which directories they should be using, but does not get involved with creating directories itself, or making decisions about the priority of multiple directories (for instance platform defaults vs. CLI flags vs. config files).

Application-specific code will be required to handle such issues, and I want directories to avoid getting involved in that: Often the cost of complexity to solve such issues in a general fashion in a library is way higher than dealing with it on the application side, especially when handling (legacy) applications with their own folder in $HOME – without breaking things for existing users.

Here is an example of an application that makes use of directories (the JVM version) and deals with migration compatibility, property files, and application-specific env vars: https://github.com/coursier/coursier/pull/676.

What are your plans for it? Can we help you get it to 1.0? :)

My plan is to declare it as stable as fast as possible. I think the main blockers are

  • having more people use and test it, to make sure it works
  • a thorough review of the decisions made concerning the various paths chosen (by someone who isn't me): https://github.com/soc/directories-rs/issues/2
  • a review of the Windows-specific code: https://github.com/soc/directories-rs/issues/1

soc avatar Feb 22 '18 10:02 soc

I have created tickets for the remaining issues I mentioned: https://github.com/soc/directories-rs/issues/1 and https://github.com/soc/directories-rs/issues/2.

soc avatar Feb 22 '18 11:02 soc

A more general note: There is a vast difference between selecting and standardizing on crates that provide certain functionality (like CLI parsing, config file parsing) and having one standardized way of handling application configuration:

With the former you probably get crates that do almost everything and allow configuration of almost everything.

With the latter, you want to be highly selective and make actual choices how things can be specified, and not allow a free for all in terms of decisions a developer can make.

soc avatar Feb 22 '18 11:02 soc

As you've noticed, I've opened some issues at directories-rs. I'd hold off on releasing a 1.0 before there are some consumers of the crate.


There is a vast difference between selecting and standardizing on crates that provide certain functionality (like CLI parsing, config file parsing) and having one standardized way of handling application configuration

Absolutely. We already have some great libraries for CLI args, and I'd love to have an equally as good story for dealing with config files. That is not one crate – it's several build on top of and complementing each other :)

(We'll hopefully see more concrete proposals for this in #6!)

killercup avatar Feb 22 '18 12:02 killercup

I think the focus should be less on a config file format and more on an API to get to those files. As a developer I might still want to be able to chose a format, say json or toml or ini via whatever serde backend exists to read/ write my configuration files. But I don't want to have to worry about where to put it.

Not sure why you brought up CLI parsing. Although thinking about it now, I'm not sure how clap.rs handles windows arguments :sweat_smile:

I haven't had a chance to play around with your crate yet but from the README it looks like it already exposes pretty much all the directory paths we might be interested in. At that point it becomes a question of making the API more ergonomic. i.e. maybe there could be a function to easily list configuration files for the given application (or None if there are none), etc

spacekookie avatar Feb 22 '18 12:02 spacekookie

Not sure why you brought up CLI parsing.

I brought it up, sorry :)

So, I've been thinking about what an all-around config solution might look like. We should not implement such a thing right now, but discuss what needs to happen to get there!


Here's a small proposal that integrates ideas from clap (v3, this is future!) and configure to get the discussion going:

#[derive(Debug, Deserialize, Clap, Configure)]
#[config(prefix = "diesel")]
struct Args {
    #[clap(short = "q", long = "quiet")]
    quiet: bool,
    #[clap(long = "database-url")]
    database_url: bool,
    #[clap(subcommands)]
    command: DieselCliCommand, // an enum defining subcommands with their own fields and attributes
}

fn main() {
    let args = Args::configure()
        .read_from(configure::adaptors::config_file::toml("diesel_cli.toml")) // Invokes serde
        .read_from(configure::adaptors::env_file()) //  dotenv
        .read_from(configure::adaptors::env()) // std::env
        .read_from(configure::adaptors::clap_auto_init()); // Clap incl. early exit on `-h` and stuff like that
}

You can then:

  • pass --database-url=something.sqlite
  • execute the program with env DIESEL_DATABASE_URL=something.sqlite
  • Have a .env file with DIESEL_DATABASE_URL=something.sqlite
  • Have a ~/.config/diesel_cli.toml file

Is that approximately the direction in which you want to go? What needs to happen to get there?

killercup avatar Feb 22 '18 12:02 killercup

I think the CLI/conf/env story should be in another issue.

TeXitoi avatar Feb 22 '18 13:02 TeXitoi

Sure, that was just for inspiration and to set some context. (If you have other use cases/ideas, please tell us :))

killercup avatar Feb 22 '18 13:02 killercup

I have a couple of request in the structopt issues about that (no ideas, but persons wanting something like https://github.com/rust-lang-nursery/cli-wg/issues/7#issuecomment-367673115)

TeXitoi avatar Feb 22 '18 13:02 TeXitoi

Like @spacekookie said, I think it should focus on abstracting over platform specific issues and not on the format, or providing "key->value" style API.

As the application writer, I want to just specify a file name, and let this crate handle where to store it. I then worry about formats, reading/writing, etc.

Then later on someone could write a generic crate to abstract over this configure crate, using something like serde to give a key->value style API.

Here's how I see the crate structure playing out (note, the crate names are just generic and not referring to anything existing right now).

config

kbknapp avatar Feb 22 '18 16:02 kbknapp

At a former employer, I wrote a config file management library (in Python) that turned out to be popular with my fellow developers (because it was easy to add to an existing project) and with our operations staff (because all our tools worked the same way, and the configuration was flexible enough for most of our use-cases). It worked like this:

  • an application would include a configuration file containing all the generic defaults, in the Python ConfigParser format (basically, an INI file)
  • the application calls into the library, passing the application name and the defaults file
  • the library reads the defaults
  • the library reads /etc/xdg/$appname/config.cfg (the standard XDG system-wide config directory, if it exists) and overlays those settings on top of the defaults
  • the library reads ~/.config/$appname/config.cfg (the standard XDG per-user config directory, if it exists` and overlays those settings on top of the defaults
  • For each section and key in the combined configuration, the library would build a string like $appname_$section_$key and upper-case it. If an environment variable with that name existed, its value would replace the value loaded from the config files
  • the library returns the fully-populated configuration data to the application

Pros:

  • The application only needs to call a single function.
  • The defaults file can contain all the possible config settings, example values and even documentation for them
    • maybe not the best possible location for such things, but still better than "scattered across the application in each bit of code that reads a config variable"
  • System-wide configuration is useful for provisioning tools like Ansible/Puppet/Chef/etc. that deploy the tool to automatically configure it to work on that host (for example, setting a default HTTP proxy, or picking a geographically close rendezvous server)
  • Per-user configuration means users can set things up the way they like them
  • Environment variable configuration means one tool can launch another tool and force some configuration option it needs (like output format, or log file name)
  • At each stage, you can override some defaults without replacing all of them
    • for example, you can use Ansible/Puppet/Chef/etc. to change the "default HTTP proxy" in the system-wide config without worrying about users who have created their own config missing out
  • Because the INI format is limited, overlaying configuration files is simple, with predictable results

Cons:

  • It can be difficult to fit an application's configuration needs into the limited vocabulary of INI files
  • With no configuration schema, the application has to do all the deserialization/validation work itself
  • If a config section foo has key bar_baz, and section foo_bar has key baz, both will be mapped to the environment variable $appname_FOO_BAR_BAZ and there isn't really any way around that.
  • There's no good way to extend this to CLI parsing, not least because it would be impossible to generate decent --help output from the limited information we have

If I were to attempt something similar in Rust:

  • I'd try to find a way to build it around a configuration schema object (following the model of serde and structopt) instead of tossing around raw config files
  • I'd probably leave the environment-variable config the same; it's clunky but we didn't need it very often, so it wasn't a problem in practice
  • I'd absolutely 100% require the ability to overlay config files for different sources, though—I don't know how that would work exactly with richer data formats than INI, but I'd have to find a way

Screwtapello avatar Mar 02 '18 12:03 Screwtapello

I think the focus should be less on a config file format and more on an API to get to those files.

One reason to consider a standard config file format, or at least a standard config data model: on Windows, perhaps the standard configuration source could/should be the Registry, rather than the filesystem?

Screwtapello avatar Mar 02 '18 13:03 Screwtapello

After some research on that, it seems that most developers recommend and prefer files over the registry:

  • https://softwareengineering.stackexchange.com/questions/144238/ini-files-or-registry-or-personal-files
  • https://blog.codinghorror.com/was-the-windows-registry-a-good-idea/

soc avatar Mar 02 '18 14:03 soc

Since this thread is about the location of config files rather than their contents this may be a bit off topic, but here goes anyway:

Similar to how structopt works, I'd love to do

#[derive(Structconfig)]
pub struct Config {
    timeout: u8
    #[structconfig(name="retries", default=3)]
    no_of_retries: u8,
    files: Vec<PathBuf>,
}

and have all the config stuff taken care of for me!

Edit:

I'd try to find a way to build it around a configuration schema object (following the model of serde and structopt) instead of tossing around raw config files

Didn't see that it had already been suggested.

richard-uk1 avatar Mar 02 '18 21:03 richard-uk1

@Screwtapello

How did your code deal with first run, if there wasn't a config file? Did it assume you wanted to use the defaults, or did it exit and prompt you to create a config file? (or did it walk you through creating the config file interactively?)

richard-uk1 avatar Mar 02 '18 22:03 richard-uk1

@derekdreery

At first run, it would use the defaults. For the various tools we created, every config option always had a sensible out-of-the-box default. Things the program absolutely could not know without asking would generally be command-line arguments, not config options.

It's a big world, and I'm sure there's some potential config options that cannot possibly have a sensible default, but I can't think of one right now. If anyone has an example, I'd love to hear it.

Screwtapello avatar Mar 02 '18 22:03 Screwtapello

Another aspect of config management to consider is passwords. Looks like there is a keyring crate that could use some polish and advertising.

epage avatar Mar 09 '18 17:03 epage

Hey everyone. I'm the current dev lead of conda, which is a cross-platform, system-level package manager. Currently written in python--but we're in the initial stages of considering transitioning key pieces to rust.

Just wanted to add to this discussion how we do configuration, because it's been powerful and has worked out very well. It's also very similar to what @Screwtapello described.

For each invocation of our executable, we build up a configuration context object from four sources of configuration information:

  1. hard-coded default values
  2. (potentially multiple) configuration files, including support for files in ".d" directories
  3. environment variables
  4. command line flags

These are linearized in a way that the configuration sources conceptually closest to the process invocation take precedence. That is, if a configuration parameter is provided as a CLI flag, but also provided in a configuration file, the CLI-provided value would win. I guess the insight here is that most CLI applications deal with at least one configuration file, environment variables, and CLI flags anyway, and we've just realized that they all represent basically the same type of information, and can be generalized and unified.

One capability that was especially important for us to add was the ability for sysadmins to lock down configuration for the entire system in "lower-level" read-only files. As we merge the sources of configuration information, we provide a flag sort of like the css !important that lets the lower-level value be the final value.

I don't want to go into too much detail here. There's a blog post with more details, including how we deal with merging sequence and map-type configuration parameters. I did want to point all this out though as support for the usefulness of what @Screwtapello described.

kalefranz avatar Mar 13 '18 02:03 kalefranz

As we merge the sources of configuration information, we provide a flag sort of like the css !important that lets the lower-level value be the final value.

An alternative model that achieves the same goal is to have separate "config" and "override" files:

  • system-wide config
    • per-user config
      • environment variables and command-line flags
    • per-user overrides
  • system-wide overrides

The advantage over an !important flag is that you don't need special syntax in your config-file format (and therefore serde, etc.) while the disadvantage is that you have nearly twice as many config locations to document, and for users to check when diagnosing surprising behaviour.

Screwtapello avatar Mar 13 '18 02:03 Screwtapello

the disadvantage is that you have nearly twice as many config locations to document, and for users to check when diagnosing surprising behaviour

@Screwtapello you could mitigate this by being able to generate something like the following

# Configs - some config option
 1. There was no value at system-wide level. *value = default*
 2. Found value *newvalue* at user level *value = newvalue*
 3. There was no value at env/cli level *value = newvalue*
 4. There was no value at user override level *value = newvalue*
 5. Found value *newvalue2* at system override level *value = newvalue2*
 6. Final value for *config option* is *newvalue2*

richard-uk1 avatar Mar 13 '18 15:03 richard-uk1

One option would be to have an API like

Config::from(system_overrides, commandline, environment, config_file, legacy_config_file, system)

Where people describe the order of settings they want to have and the library resolves settings in that order until a value is found.

I believe having some hard-coded, common-sense lookup scheme would be nice, but I fear that many applications would not fit well into it.

I think an additional bit that's important to get right is to track the origin of each setting, so that people don't end up with some_setting = "value", but some_setting = ("value", source) I think this would make it way more transparent to understand and debug where settings come from.

soc avatar Mar 14 '18 22:03 soc