i/p/requestprompts: add timeout for prompts
When there has been no retrieval of prompt details for a given user after a particular duration, expire all outstanding prompts for that user, as this suggests that there is no prompt client running for that user.
If a user retrieves prompt details or interacts with prompts in some way, such as by retrieving all prompts, a prompt by ID, or attempting to reply to a prompt, bump the timeout to a much longer timeout, since this indicates that a prompt client is running for that user.
This is tracked internally by https://warthogs.atlassian.net/browse/SNAPDENG-28665
This doesn't strictly need to land for 2.65, but up to you @ernestl whether you'd like to include it, if it can be approved quickly.
CC @sminez :)
race checker isn't happy:
==================
WARNING: DATA RACE
Read at 0x000000b954a0 by goroutine 52:
github.com/snapcore/snapd/interfaces/prompting/requestprompts.(*PromptDB).AddOrMerge.func1()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/interfaces/prompting/requestprompts/requestprompts.go:312 +0x130
Previous write at 0x000000b954a0 by goroutine 36:
github.com/snapcore/snapd/interfaces/prompting/requestprompts_test.(*requestpromptsSuite).TestPromptExpiration.MockInitialTimeout.Backup[go.shape.int64].func4()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/testutil/base.go:70 +0x35
runtime.deferreturn()
/snap/go/10698/src/runtime/panic.go:605 +0x5d
runtime.call16()
/snap/go/10698/src/runtime/asm_amd64.s:775 +0x42
reflect.Value.Call()
/snap/go/10698/src/reflect/value.go:365 +0xb5
gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/check.go:775 +0x965
gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/check.go:669 +0xe9
Goroutine 52 (running) created at:
time.goFunc()
/snap/go/10698/src/time/sleep.go:215 +0x44
Goroutine 36 (finished) created at:
gopkg.in/check%2ev1.(*suiteRunner).forkCall()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/check.go:666 +0x5ba
gopkg.in/check%2ev1.(*suiteRunner).forkTest()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/check.go:757 +0x155
gopkg.in/check%2ev1.(*suiteRunner).runTest()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/check.go:812 +0x419
gopkg.in/check%2ev1.(*suiteRunner).run()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/check.go:618 +0x3c6
gopkg.in/check%2ev1.Run()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/run.go:92 +0x44
gopkg.in/check%2ev1.RunAll()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/run.go:84 +0x124
gopkg.in/check%2ev1.TestingT()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/vendor/gopkg.in/check.v1/run.go:72 +0x5d3
github.com/snapcore/snapd/interfaces/prompting/requestprompts_test.Test()
/home/runner/work/snapd/snapd/src/github.com/snapcore/snapd/interfaces/prompting/requestprompts/requestprompts_test.go:44 +0x26
testing.tRunner()
/snap/go/10698/src/testing/testing.go:1690 +0x226
testing.(*T).Run.gowrap1()
/snap/go/10698/src/testing/testing.go:1743 +0x44
==================
Found 1 data race(s)
Per discussion with Innes, we'd like to pursue (1), where snapd recognizes a client being active by it retrieving prompt details and/or replying to prompts. A dedicated ACK endpoint would also work, but then there would need to be extra communication and synchronization, within the client, within snapd, and between the client and snapd. At the moment, this isn't necessary, and would add complexity.
Rebasing to resolve conflicts with interfaces/prompting/requestprompts/requestprompts.go
Changed to 2.67 after discussion, this requires more time.
I've added dedicated clientActivity fields which get passed through from the daemon to the ifacestate manager to the requestprompts backend. For now, the daemon has hard-coded which endpoints qualify as client activity (namely, getPrompt, getPrompts, and postPrompt), but in the future we could expand this in one of a few ways:
- Clients could pass an argument to the HTTP request indicating whether to count the request as client activity
- Snapd could (potentially) check the origin of the connection, and if it's from an app which is registered as a handler service, it will treat the request as client activity
Another question is whether to count client activity which strictly touches the rules backend (e.g. getting a rule, as opposed to adding a rule, the latter of which results in a pass over outstanding prompts to see if any are now matched) as client activity. For now, we do not. If we did, we'd probably want to add a dedicated method on the prompts backend which simply bumps the timeout without doing anything else, since those API requests wouldn't otherwise have any effect on the prompts.
For now, though, I think this is sufficient for our use case, as the official prompting client never accesses, adds, or modifies rules directly, and requests to the prompt-related endpoints are treated as client activity. Please let me know if this sounds good to you @sminez
Codecov Report
Attention: Patch coverage is 96.18321% with 5 lines in your changes missing coverage. Please review.
Project coverage is 79.05%. Comparing base (
96ea7b0) to head (1a9a804). Report is 132 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...erfaces/prompting/requestprompts/requestprompts.go | 95.53% | 4 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #14390 +/- ##
==========================================
+ Coverage 78.95% 79.05% +0.10%
==========================================
Files 1084 1087 +3
Lines 146638 147949 +1311
==========================================
+ Hits 115773 116957 +1184
- Misses 23667 23757 +90
- Partials 7198 7235 +37
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 79.05% <96.18%> (+0.10%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@olivercalder that all sounds good from the client side :+1: I agree that hitting the rules endpoints shouldn't bump the timeout in any way so lets leave things as you describe in your most recent comment. If we do want to make things smarter in future I'd prefer it were done by snapd determining that the request came from a "real" client rather than the client claiming that it is "real" as that then opens a whole can of worms I suspect. Presumably this can be done statically by setting up the interface for prompting in the right way?
Rebased to resolve merge conflicts and pull in some test fixes from master.
Adding the timer interface and fake timer to timeutil more than doubled the size of this PR. Perhaps that commit needs to be spun off into its own PR and landed first?
This PR now depends on https://github.com/canonical/snapd/pull/14672
Now that the mockable timers PR #14672 has landed, this PR has been rebased and is ready for review.
@olivercalder there are some outstanding comments, can you have a look?
I accidentally force pushed away some changes built on the mockable timers a few weeks ago.... thankfully I can retrieve them, but ouch. Final straw pushing me to always use git worktree from now on, no more mess of out-of-sync cloned repos.
Spotted something in one of my notes. When we persist prompts across snapd restarts, it may be beneficial to have a longer initial timeout for prompts when snapd is first started, in case startup takes some time and the client doesn't have a chance to respond before the short initial timeout. Perhaps manually setting timeouts for existing user prompt DBs to activityTimeout during the prompt load() function (which does not yet exist) will suffice.
Test failures:
- google:ubuntu-20.04-64:tests/unit/go:clang --- settle is not converging, but only on clang, not sure why
- openstack:fedora-41-64:tests/upgrade/basic --- problem with mount type squashfs
- google-distro-2:opensuse-tumbleweed-64:tests/main/microk8s-smoke:edge
- google-distro-2:opensuse-tumbleweed-64:tests/main/prepare-image-validation-sets
These are all unrelated (prompting doesn't even run on any of these systems) so I think this PR is safe to merge.