opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Restrict characters allowed in expand provider URIs

Open mx-psi opened this issue 3 years ago • 5 comments
trafficstars

During review of #4742 I noticed that we basically allow arbitrary characters on the identifiers used for environment variable expansion (and soon expansion from other kinds of sources). This is too permissive, since it will restrict us from ever supporting recursion ${ENV_${RECURSIVE}}, URI-like parameters for default (e.g. ${ENV?default=something}) or other features that we may want to add in the future.

We need to validate identifiers to exclude at least these characters ($, {, }, ?, =). Expanding the set of allowed identifiers is not a breaking change, so we should err on the side of caution and be conservative about what characters we allow.

One possibility would be to adopt the Unicode identifier and pattern syntax, which is commonly used in programming languages. We can be even more restrictive if we think any character in that set would potentially be used in the future for special syntax.

mx-psi avatar Jul 19 '22 09:07 mx-psi

Note: this should not, in principle, apply to URIs from configuration sources, otherwise we can't support things like the yaml provider.

mx-psi avatar Jul 19 '22 10:07 mx-psi

Since right now we only have the env provider, we can explore the environment variables case. Here is some info for Windows (many special characters allowed) and for Linux (anything goes in principle, the recommendation is to use the Portable Character Set).

mx-psi avatar Jul 20 '22 07:07 mx-psi

The use-cases: ${ENV_${RECURSIVE}} and ${ENV?default=something} seem very different to me.

Here are some suggestions/questions:

  1. I suggest that we do NOT support this ${ENV?default=something} and only support ${env:ENV?default=something} and the "non-schema" (not an URI like) support that we offer, is only for backwards compatibility and we offer only ${ENV} and $ENV.
  2. I suggest to not add any new feature to the "non-schema" expansion logic, and keep that for backwards compatibility as much as we want, maybe eventually consider to remove it.
  3. I think it is impossible to support (ever) something like ${http://my.domain.com?os=${OS}} unless we add lots of restrictions about what the opaque value, because we don't know if ${OS} should be an expended environment or any other interpretation from the provider.
  4. For the opaque value unless we add at least a restriction to not contain { and } (which is hard to do since we want to support something like a JSON/YAML provider) it is impossible to support ${env:VALUE}${env:VALUE}" because we cannot now where to end the parsing at first or second }`.

If we agree on this, then from https://github.com/open-telemetry/opentelemetry-collector/pull/4742 we should remove the logic of expanding "non-schema" things, and expand only the "schema" URIs. This works well with the other env variables, since ":" is disallowed in the environment variables names, but we still have the problem from 4.

bogdandrutu avatar Aug 05 '22 23:08 bogdandrutu

The use-cases: ${ENV_${RECURSIVE}} and ${ENV?default=something} seem very different to me.

My point is that if there is a reasonable chance that we will want to implement this, we should not shoot ourselves in the foot now by allowing any character. Actually implementing either of these is a different discussion. If we think either of this will have no chance of being implemented, then we can of course ignore that case.

I agree with 1 and 2, we can leave the non-URI as it is. I am unsure about (3): if we disallow $,{, } (unless escaped in some way) then I don't see why this would not work.

which is hard to do since we want to support something like a JSON/YAML provider

JSON/YAML provider is useful when describing configuration sources, but I don't see how it is useful in confmap providers. What's the advantage of writing key: ${yaml:<VALUE>} instead of key: <VALUE>?

mx-psi avatar Aug 08 '22 08:08 mx-psi

The current plan is:

  • [x] Merge https://github.com/open-telemetry/opentelemetry-collector/pull/5861 which documents the limitation for the scheme not for the opaque_data, but we need to restrict characters in that part as well. (#5861)
  • [x] Ensure by using some regexp that the new expand mechanism that uses the providers apply only to URI like embeddings. (#5863)
  • [x] Investigate restrictions/encodings to be used so that we can support ${http://my.domain.com?os=${OS}} and ${env:VALUE}${env:VALUE} in the future or now if easy enough.
  • [x] [from @mx-psi] Should we also make service.NewConfigProvider fail if it we pass a provider that does not have a valid scheme?

bogdandrutu avatar Aug 09 '22 15:08 bogdandrutu

Last item is tracked in a separate issue https://github.com/open-telemetry/opentelemetry-collector/issues/6274

bogdandrutu avatar Oct 11 '22 18:10 bogdandrutu