oauth2-proxy icon indicating copy to clipboard operation
oauth2-proxy copied to clipboard

RFC Introduce structural configuration (Deprecate flags and env?)

Open JoelSpeed opened this issue 4 years ago • 35 comments

Expected Behavior

Configuration should be simple and easy for users of the project to understand.

Current Behavior

We have lots of weird hacks that allow people to overload strings to force many different options within a single option.

For example, --whitelist-domains allows a URL such as example.com to match only that domain, but .example.com to match subdomains as well. Instead, if we had configuration that was structured, we could be move explicit such as:

whitelistDomains:
- host: example.com
  includeSubdomains: true|false

This would not only be more obvious to users when configuring, but would reduce complexity when parsing the config.

Possible Solution

I think we should implement a structured configuration file and break up options into sections and those which have been overloaded should be made into real objects with proper options so we don't have to parse lots of weird strings and interpret these.

We have already implemented Viper within the project, so we can use that to load one of many different structures. My suggestion would be YAML, but since viper can interpret lots, we could have an option for the config format and pass that through as well, allowing people to pick their favourite (I think that's too much personally though I'm willing to be persuaded).

As for how we do the transition, there are several options:

Just do it

This would involve, within a single release cycle, completely rewriting the configuration to include a structured configuration. This would be a major refactor of the codebase and if we did this, we would likely want to write some utility that loads the existing configuration and generates config files for the user to make this transition easier.

Do it slowly

This would involve introducing a new flag --structured-config, and over the period of a few releases, migrating small sections of the configuration to the new structured config, while leaving the old configuration as it is. Then once we have done the entire structure redesign, we deprecate the old format and help people to move.

This would slow down the pace of the migration but would add additional complexity within the project temporarily. As I see it, we would need two public structures (legacyConfig and structuredConfig) which would represent the two config file types users could use, plus an internal config type that would be used within the codebase and some way to merge the two public types into the internal type.

Slowly but still using a single configuration file

This would involve switching the format over in a breaking change, and then slowly making lots of breaking changes that adjust the format of the configuration file to eventually introduce the structure that we want.

Additional ideas

Env from config files

Currently, every option that we have can be specified via env. This wouldn't work if we had structure, how do you address a field within an object within a list for example.

Dex solved this issue by allowing users to write $ENV_VAR in their structured configuration and running the equivalent of an envsubst on the file before parsing it. This would allow those users who want to keep env vars for certain parts of their config (eg secrets) to specify their own env vars and have them loaded. This would also allow complex lists of complex objects to be written and include env vars where required. (Example in Dex docs)

Deprecate most flags

As above, with a structured configuration, we lose a lot of the ability to map 1:1 our options to flags. So I'm proposing that we basically deprecate all flags bar a very skeleton set (eg --config, --version) as part of this migration.

OpenAPI?

Another idea that crossed my mind, is that with a proper structured configuration, it might be good to include an OpenAPI schema. I have seen these used before for validation (which would be a win for us, less validation code, just farm that off to the OpenAPI schema validation) and also to be converted into autogenerated docs. I'm not sure exactly how this would be implemented, but generated docs would definitely make me a lot happier about the state of our documentation.

JoelSpeed avatar May 08 '20 11:05 JoelSpeed

Good idea. YAML can be developed as an alternative way to populate the internal config, thus minimizing impact, and eventually replace the custom .cfg format.

Just make sure to define an extensible YAML schema, e.g. avoid lists of strings when possible. For schema definition Cue is also a possibility as an alternative to OpenAPI.

rustyx avatar May 09 '20 11:05 rustyx

I like the idea. FWIW, I would prefer a single-cycle breaking change that leaves me with the new config on the other end and removes all the old string parsing code at the same time.

As for format, I'd probably favour toml over yaml. Viper supports both, so it would be more about what's easiest for humans to read/write.

How do you propose to handle secrets going forward? Retain flags / env or require the config file as a whole to be secret so that I can contain those, too?

wonderhoss avatar May 09 '20 11:05 wonderhoss

Just make sure to define an extensible YAML schema, e.g. avoid lists of strings when possible. For schema definition Cue is also a possibility as an alternative to OpenAPI.

@rustyx This is a good point about lists of strings. I think there are some cases where it might be ok (cookie domains springs to mind) but for the most part you're definitely right, by making lists of objects we can add new fields easily later

@gargath when you say single cycle, do you mean as in cutting over with a major change or one release sharing both configuration formats?

For format, we could potentially also include the ability for users to specify the format via a flag, allowing them all of vipers options? My only issue with that would be it's harder to read when someone posts their config for diagnosis

For secrets, my suggestion would be having env vars in the config and then us basically doing envsubst as Dex does. Alternatively, we could do flags still but I see this potentially causing issues where you need multiple complex objects in a list containing secrets, eg if we started supporting multiple providers simultaneously

JoelSpeed avatar May 09 '20 15:05 JoelSpeed

@JoelSpeed I meant changing over in a single release. Maybe a 6.x? Supporting both variants simultaneously might be quite painful to implement and is of limited benefit once the switchover is complete. To me, the biggest benefit of the change is increasing security and reducing potentials for mistakes in the config. People will have to change at some point anyways... Maybe instead of supporting both, how about committing to backporting security fixes to 5.x for a while so that people who don't want to change their config can stick with that?

As for secrets, I am not too familiar with how Dex does it. I would just prefer not having to keep the entire config file secret. If there is a way to somehow specify the actual secret values in env vars, that would be ideal. Don't really mind the specifics as long as it's documented.

As for the format: my 2 cents are that allowing all options Viper supports is probably overkill. Also makes documenting harder and I see your point about people posting theirs in different formats. Another downside is that searching Google for keywords from your flavour format might not bring up all relevant results. Finally, there's no other project out there - at least that I'm familiar with - where multiple flavours of config file are supported. I'd say let's stick to one.

wonderhoss avatar May 09 '20 21:05 wonderhoss

@JoelSpeed I meant changing over in a single release. Maybe a 6.x? Supporting both variants simultaneously might be quite painful to implement and is of limited benefit once the switchover is complete. To me, the biggest benefit of the change is increasing security and reducing potentials for mistakes in the config. People will have to change at some point anyways... Maybe instead of supporting both, how about committing to backporting security fixes to 5.x for a while so that people who don't want to change their config can stick with that?

This makes sense. My only concern is that in not having both versions supported, we will not only be asking users to upgrade (through various other changes) but also rewrite their config completely. In this case, how do they know that whether the code is broken now or just their config if they can't get the proxy working again?

As for secrets, I am not too familiar with how Dex does it. I would just prefer not having to keep the entire config file secret. If there is a way to somehow specify the actual secret values in env vars, that would be ideal. Don't really mind the specifics as long as it's documented.

I explained this briefly in the original comment (Env config from files), I think this solves the issue and as you say, will be documented

As for the format: my 2 cents are that allowing all options Viper supports is probably overkill. Also makes documenting harder and I see your point about people posting theirs in different formats. Another downside is that searching Google for keywords from your flavour format might not bring up all relevant results. Finally, there's no other project out there - at least that I'm familiar with - where multiple flavours of config file are supported. I'd say let's stick to one.

Makes sense, I'll look into the formats it supports then and see what I think about each, hopefully others have opinions. Since we've already mentioned YAML and TOML, what makes you prefer TOML over YAML?

JoelSpeed avatar May 10 '20 09:05 JoelSpeed

The yaml spec is notoriously complex and can yield unexpected results. There have been quite frequently bugs in yaml parsers. Given the choice and assuming it's expressive enough for what we need, I would prefer toml on those grounds. That said, it is only my tuppence. Maybe people (probably?) being more familiar with yaml might be a counter argument.

wonderhoss avatar May 10 '20 09:05 wonderhoss

Maybe people (probably?) being more familiar with yaml might be a counter argument.

The popularity of YAML and prevalence within the devops space was my main reason for choosing it, it will be easier for people to be familiar with. Though I hear your concerns about the spec and bugs

JoelSpeed avatar May 10 '20 10:05 JoelSpeed

Thanks very much for this.

Like the idea of envsubst mechanism for secrets (or anything) in the config file. We would continue to hydrate via Sealed Secrets controller. Not having this ability is a pain in the Prometheus AlertManager world.

Prefer toml or ini because yaml indentation is annoying. Especially when yaml is indented already like when embedded in a ConfigMap. See Ansible inventory file (with groups and vars) options for comparison between ini and yaml that might correlate to the "multiple provider" idea.

Could there be a configuration change only release to simplify migration? It could have its own config related bug fixes (if any). Fast followed by feature release perhaps but no pressure.

karlskewes avatar May 12 '20 18:05 karlskewes

+1 for structured configuration and YAML (I find TOML hard to read). Also +1 for releasing this as a single breaking-change release.

I think this gives the opportunity to lay groundwork for some frequently requested features such as:

  • multiple identity providers (#88)
  • rule-based access control (e.g. #204, #451, #466)

It could look like this:

identity_providers:
  google:
    type: google
  org_gitlab:
    type: gitlab
    endpoint: https://gitlab.example.com
rules:
  pre_auth:
    # skip authentication for requests from internal network
    - if: cidr_match(request.remote_addr, '172.17.0.0/8')
      set_policy: skip_auth
    # use org_gitlab as sole identity provider for staff tools
    - if: request.host == 'staff.example.com'
      set_identity_providers: [org_gitlab]
  post_auth:
    # limit access to staff tools to employees
    - if: request.host == 'staff.example.com' && !ends_with(user.email, '@example.com')
      set_policy: deny

fnkr avatar May 22 '20 10:05 fnkr

@fnkr thanks for linking those relevant issues. This is possibly one of my main drivers for suggesting structured configuration, to remove the weird syntaxes we are overloading in strings

I had a look into TOML to try and understand what our config might look like if we went that route, I have to say, I'm not a fan.

TOML array handling means that you can split an array anywhere within your config, entries in the array aren't necessarily grouped and to me, that seems wrong. I think that will make debugging config hard, it will be much easier to miss an array element with TOML than YAML

JoelSpeed avatar May 22 '20 10:05 JoelSpeed

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar Sep 11 '20 00:09 github-actions[bot]

This is a bit of an epic rather than a single issue, we are actively working on this but it's going to take a number of months to complete

JoelSpeed avatar Sep 11 '20 09:09 JoelSpeed

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar Nov 11 '20 00:11 github-actions[bot]

Instead of OpenAPI, I recommend using JSON Schema, which is based on it. OpenAAPI. OpenAPI is intended to describe the HTTP API only.

It is worth adding that the use of JSON Schema will cause that we will have reference documentation, For example: https://artifacthub.io/packages/helm/artifact-hub/artifact-hub?modal=values-schema

mik-laj avatar Dec 04 '20 16:12 mik-laj

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar Feb 03 '21 00:02 github-actions[bot]

In the era of Docker and 12factor, I would argue against dropping the configurability via environment variables. I came across this project only yesterday, and just by putting together a couple of environment variables following the documentation, I was able to start a docker container that was doing what I needed. Had I had to write complex yaml and mount a configuration file into a container, I might have turned away and looked into another solution.

You might have a look at how Spring Boot solves this issue (1 and 2): A Spring Boot application can be configured using complex yaml, but every configuration option is also made available as environment variable (yes, even lists and hashes). This is much more user friendly imho.

netmikey avatar Feb 06 '21 12:02 netmikey

I'm not against trying to support something like springboot has, but I don't think it's a priority for us. I had a bit of a look around for packages that might implement this functionality that we could easily implement here but didn't find anything yet.

JoelSpeed avatar Feb 06 '21 17:02 JoelSpeed

I had a bit of a look around for packages that might implement this functionality that we could easily implement here but didn't find anything yet.

This is part of the OpenAPI standard, so it's very likely that someone has already implemented it. See: "deepObject" in OpenAPI spec.

mik-laj avatar Feb 06 '21 17:02 mik-laj

Do it slowly. Making major backward incompatible change, that required essential work for your clients will just stop many clients from upgrading to a new version. Also after such breaking release you can get overwhelmed number of bugs/complains, that it will be hard to support.

Run in parallel existing and new way of configuration, but new flags/configurations add only to the new structural config. It will force clients that need new features to start use new format, but allow others to continue use existing format as long as they want.

MNF avatar Mar 11 '21 19:03 MNF

We are currently working on the migration and cleaning up a lot of the code along the way. To enable this, we have created a number of internal conversion options.

This has allowed us to start introducing the new config as an --alpha-config option for users who want to try this ahead of time. I think we will continue with this method and eventually get to a point where the whole configuration is supported as structured and switch the conversion logic, ie, default to the new and have some legacy config flag. This will signal to users that they really should be changing over.

Also, because we have the conversion logic already in place. If you are using the old logic and need to switch, we have implemented a conversion flag that will convert the old config to the current alpha structure, this will aid users in migration

JoelSpeed avatar Mar 13 '21 08:03 JoelSpeed

Is alpha in flag and in documentation refers to the stage of development rather than feature name? It will be better to name the flag as --structured-config and in documentation keep warning that structured-config is on alpha stage. Later you will change or remove stage related warning, but do not change the feature name.

Also conversion flag is useful, but it would not help in all cases. E.g. we are using .cfg only to keep secrets(3 parameters ) and majority of parameters are specified as arguments in Kubernetes deployment yaml. It means that we will need to do arguments conversion manually, and it could be a reason for slow adoption of new version, if it would not support old config for backward compatibility for long period.

MNF avatar Mar 13 '21 23:03 MNF

Is alpha in flag and in documentation refers to the stage of development rather than feature name? It will be better to name the flag as --structured-config and in documentation keep warning that structured-config is on alpha stage. Later you will change or remove stage related warning, but do not change the feature name.

I wanted it to be abundantly clear that when configuring this, it is in alpha stage and may change at any point. I guess it could have been --alpha-structured-config but the long term plan is that it will replace --config so it made sense to me to just be --alpha-config.

Also conversion flag is useful, but it would not help in all cases. E.g. we are using .cfg only to keep secrets(3 parameters ) and majority of parameters are specified as arguments in Kubernetes deployment yaml. It means that we will need to do arguments conversion manually, and it could be a reason for slow adoption of new version, if it would not support old config for backward compatibility for long period.

Long term, the number of flags will be greatly reduced and the only option will be a config file (which in some cases, e.g. for secrets, will allow you to specify an env or separate file to load from). The conversion actually takes in the whole config, flags, env and from file, and then converts to the new structure. So any flags etc will end up in the new file.

As we get further into the project and start finalising new features/structures we will start adding documentation about the conversion and in particular values that should be considered secret and removed from the file/replaced with env vars.

JoelSpeed avatar Mar 14 '21 11:03 JoelSpeed

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar May 14 '21 00:05 github-actions[bot]

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar Jul 14 '21 00:07 github-actions[bot]

is this really stale?

frittentheke avatar Jul 14 '21 06:07 frittentheke

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar Sep 13 '21 00:09 github-actions[bot]

not stale

frittentheke avatar Oct 06 '21 17:10 frittentheke

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar Dec 06 '21 00:12 github-actions[bot]

not stale

frittentheke avatar Dec 07 '21 21:12 frittentheke

This issue has been inactive for 60 days. If the issue is still relevant please comment to re-activate the issue. If no action is taken within 7 days, the issue will be marked closed.

github-actions[bot] avatar Feb 06 '22 01:02 github-actions[bot]