opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Watch for config file changes in fileprovider

Open swiatekm opened this issue 3 years ago • 4 comments
trafficstars

Description: Make the file config provider watch for changes to the config file.

This is done via polling the file every second, with an additional grace period of two seconds for actually delivering updates to the collector. We're not using notification mechanisms provided by the OS because they're unreliable or annoyingly complex to make work for some popular use cases - most notably Kubernetes ConfigMaps. As we don't need frequent updates and our watched files are small, polling is both performant and easier to understand than, say, inotify.

This change is more complex than it technically needs to be because it tries to reasonably deal with misbehaving users. For example, it'll quietly accept a user calling .Shutdown before .Close:

r, err := provider.Retrieve("file:/path/to/config")
provider.Shutdown()
r.Close()

and r.Close() is even thread-safe. If we're ok with the above resulting in a panic or a deadlock, I can make both the provider and the filewatcher simpler.

Link to tracking Issue: https://github.com/open-telemetry/opentelemetry-collector/issues/273

Testing: Added additional tests for the provider and separate unit tests for the filewatcher. I also did a couple manual e2e tests of the more interesting edge cases, like network filesystems and deep symlink chains.

swiatekm avatar Aug 22 '22 16:08 swiatekm

Codecov Report

Merging #5945 (287c510) into main (62d41a8) will increase coverage by 0.05%. The diff coverage is 96.22%.

@@            Coverage Diff             @@
##             main    #5945      +/-   ##
==========================================
+ Coverage   91.90%   91.95%   +0.05%     
==========================================
  Files         200      201       +1     
  Lines       12414    12568     +154     
==========================================
+ Hits        11409    11557     +148     
- Misses        793      797       +4     
- Partials      212      214       +2     
Impacted Files Coverage Δ
confmap/provider/fileprovider/filewatcher.go 93.10% <93.10%> (ø)
confmap/provider/fileprovider/provider.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Aug 23 '22 10:08 codecov[bot]

I love this idea, but my gut says we should not make it a default for fileprovide initially. The current expectation for the file provide (and performance expectations) is that the file is not watched. It feels appropriate to either hide this capability behind a feature gate or as a different provider/flag. That way existing customers have to opt-in for now.

Sounds reasonable to me. It'd be pretty straightforward to gate this, considering all the new functionality is hidden behind a single conditional right now.

swiatekm avatar Aug 23 '22 18:08 swiatekm

In all honesty, after thinking about it more, different usecases may want to have this enabled or disabled in general. In Kubernetes, for example, you typically want to treat Pods as immutable, and don't want them to quietly reload configuration just because you made a change to a ConfigMap. Arguably it should be the users' responsibility to make sure Kubernetes doesn't reload the file inside the Pod, but that's an obscure enough topic that I'm not confident in users not shooting themselves in the foot with it.

I think I'm going to open an issue about this so we can more carefully weigh the pros and cons. This probably applies more generally to configuration reloading for any provider, this one is simply the first in line to have it implemented.

swiatekm avatar Aug 24 '22 16:08 swiatekm

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Sep 08 '22 03:09 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Sep 24 '22 04:09 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Oct 10 '22 04:10 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Oct 25 '22 04:10 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Nov 09 '22 03:11 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Dec 08 '22 03:12 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Dec 23 '22 03:12 github-actions[bot]