multipass
multipass copied to clipboard
[feat] Add wait-ready command to check Multipass daemon readiness
Feature: Add multipass wait-ready command
Description:
This PR introduces a new CLI command, multipass wait-ready. This command allows users and scripts to pause execution until the Multipass daemon is fully initialized and ready to accept requests. This is particularly beneficial for automation workflows that need to ensure the daemon is operational before proceeding with other Multipass operations.
The wait-ready command works by:
- Initially polling the daemon's gRPC socket until it becomes available.
- Once the socket is available, it sends a
WaitReadyrequest to the daemon. - The daemon, upon receiving this request, verifies its readiness by attempting to update image manifests (which implicitly checks connectivity to image servers).
- The command supports a
--timeoutoption (in seconds). If the daemon isn't ready within the specified timeout, the command exits with an error. If no timeout is given, it waits indefinitely. - An animated spinner and context-relevant message provide user feedback during the waiting period.
Motivation:
- Improves the reliability of automated scripts and CI/CD pipelines that interact with Multipass by providing a deterministic way to wait for daemon initialization.
- Addresses scenarios where subsequent Multipass commands might fail if the daemon is still starting up.
Related Issues: Fixes #452 Fixes #3723
Key Implementation Changes:
- Client-side (
wait_ready.cpp): New command logic, including retry for socket connection, timeout handling (reusingmp::cmd::parse_timeout), and spinner integration. - Protocol (
multipass.proto): AddedWaitReadyRequest/WaitReadyReplymessages andWaitReadyRPC method. - Daemon-side (
daemon_rpc.cpp,daemon.cpp):- New gRPC service handler in
daemon_rpc.cpp. - Core readiness check in
daemon.cppleverages the existingwait_update_manifests_all_and_optionally_applied_force()function.
- New gRPC service handler in
How to Test:
- Build this branch and
$ cd build. - Ensure the Multipass daemon is stopped.
- To verify
--timeout, simply run thewait-readycommand with the--timeoutoption:
$ ./bin/multipass wait-ready --timeout 5
Be sure not to start up the daemon first or the command might exit before the timeout.
- Expected: Info message "Waiting for Multipass daemon to be ready...", spinner is shown, command times out after ~5 seconds with an appropriate error message "Timed out waiting for Multipass daemon to be ready."
- (Optional) To replicate the issue that happens when the client tries to use the daemon before its ready (without
wait-ready), start up the daemon and immediately usemultipass launch:
$ sudo -v
$ (sudo ./bin/multipassd &) && ./bin/multipass launch
- Expected: Error message "launch failed: cannot connect to the multipass socket", "launch failed: Remote "" is unknown or unreachable.", or something like issue #3723
- Stop the Multipass daemon
$ sudo pkill multipassd
- To verify
wait-readycommand solves the issue, fit thewait-readycommand between daemon startup andmultipass launch:
$ sudo -v
$ (sudo ./bin/multipassd &) && ./bin/multipass wait-ready && ./bin/multipass launch
Optionally, add
--timeoutif desired.
- Expected: Info message "Waiting for Multipass daemon to be ready...", spinner is shown, command exits successfully (ReturnCode::Ok) once the daemon is ready (i.e., can connect to image servers) and the client begins retrieving the default release image for the
multipass launchprocess. You may Ctrl+C here to terminate the launch process.
Points for Reviewers:
- Confirmation that using
wait_update_manifests_all_and_optionally_applied_force(true)is an appropriate and robust proxy for overall daemon readiness. - Feedback on the client-side error messages and user experience (spinner, timeout message) is appreciated. The current spinner implementation addresses @ricab's suggestion in #452.
- Unit Tests: Per @ricab 's comments in issue #452, unit tests have not yet been implemented for this initial PR to facilitate earlier review. I am prepared to add them based on feedback.
Hey @rluna319, thank you for this! We'll review when we get a chance.
Hey @levkropp, would you mind doing a secondary review of this one? Than I can do a very light "tertiary" review.
@rluna319, could you also rebase on the latest main, to see if we can get some CI to run on this? Appreciate it.
@ricab I had not included your requested changes on my latest commit. I had not seen them before I made the push, my fault. I did rebase on the latest main however.
I will work on your requested changes ASAP.
Would you like me to amend your requested changes to my latest commit or have it in a separate commit?
Hey @rluna319 no worries. We generally prefer additional commits, to make it easy to identify new changes. Thanks!
@ricab I've added all of your requested changes to the recent commit. I also rebased on the latest main as well.
Hey @rluna319, it looks like you haven't signed our CLA yet. Would you mind doing so, please? https://ubuntu.com/legal/contributors
Also, there are a few linting problems. Editors often have options to strip trailing whitespace and keep a single newline at the end. You can check whitespace sanity on a diff with git diff --check .... For example: git diff --check main...wait-ready.
git clang-format might take care of whitespaces too, if you set it up. That might be a good idea anyway because it's going to run once the whitespace check succeeds.
@ricab I signed the CLA, went through linter errors (and other CI build/test errors) and fixed them to the best of my knowledge, ran the code through git clang-format and git diff --check main. I've also rebased off the latest main.
I removed some commented out includes that shouldn't have been in that commit.
I see that I still missed some linter errors. I thought I ran everything through clang-format correctly. I'll address those.
@ricab Ive addressed the issues. It says some of the workflows are awaiting approval to be run.
@ricab Ive addressed the issues. It says some of the workflows are awaiting approval to be run.
Approved the workflow.
Thank you @xmkg.
I think that the remaining workflows that fail are due to some internal CI issues. Something about a missing token. Is this something I need to fix or can fix on my end?
Thank you @xmkg.
I think that the remaining workflows that fail are due to some internal CI issues. Something about a missing token. Is this something I need to fix or can fix on my end?
Yeah, unfortunately, our CI is not happy with PRs coming from external forks. We're working on making this process smoother, but the necessary machinery is not yet in place. Would you mind if I create a PR for your PR, just for the sake of running the pipeline? We can track the progress and make changes here.
Oh ok. Sure thing.
Opened https://github.com/canonical/multipass/pull/4304.
@rluna319, we've landed a bunch of CI-related fixes to main. Could you rebase this PR?
@xmkg I have rebased the PR. Workflows awaiting approval.
@xmkg I have rebased the PR. Workflows awaiting approval.
Great, thanks. Synced #4304.
@xmkg Great thank you. I see all the CI tests have passed in #4304. Should I mind the coverage report? I've looked at it already and I can cover the line in wait_ready.cpp that it says isn't covered.
However, i'm having trouble covering the other one in daemon.cpp where we catch any other std::exception. Throwing a runtime error in update_manifests just causes the test to hang. I'm guessing its the way that exception is or isn't handled in the asynchronous task. I'm not sure what other exception would make it to that catch block. Any other exceptions that ive traced within the codepath are already handled locally. Perhaps i'm just missing something i'm not aware of.
@xmkg Great thank you. I see all the CI tests have passed in #4304. Should I mind the coverage report? I've looked at it already and I can cover the line in
wait_ready.cppthat it says isn't covered.
That would be nice.
However, i'm having trouble covering the other one in
daemon.cppwhere we catch any other std::exception. Throwing a runtime error in update_manifests just causes the test to hang. I'm guessing its the way that exception is or isn't handled in the asynchronous task.
I'd say don't worry about it. There's a way to trigger that exception catcher... but it's a bit cheeky, though. We could use a logger that throws an exception on .log(). That would make the coverage tool happy, but I'm okay with it as-is, too.
@xmkg I added the coverage for wait_ready.cpp.
I tried to implement your suggestion, throwing and exception on log(), but wasn't able to do it. I kept getting the error that the Logger has no member named "gmock_log". I thought maybe I would have to add a mock method for log() in mock_logger.cpp but I felt that would be overkill if coverage for that line wasn't crucial. I tried doing a local mock method for the logger as well but I couldn't get that to work either. I may just lack the experience to implement this properly.
Awesome! Thank you for allowing me to contribute @xmkg @ricab. A great experience, I learned a lot. Looking forward to the next!
Our pleasure @rluna319! This was a great contribution, very well pondered. Feel free to send more in!