multipass icon indicating copy to clipboard operation
multipass copied to clipboard

daemon exit is blocked if the client is currently connected

Open townsend2010 opened this issue 6 years ago • 3 comments

If the daemon is trying to exit and the client is currently connected, such as launch, the daemon exit is blocked via grpc shutdown due the active client connection.

This needs further investigation to see whether it's a grpc issue or if it's some error in how we handle shutdown of the grpc server in such cases.

townsend2010 avatar Oct 17 '19 19:10 townsend2010

Here is a stack trace of when the issue occurs:

(gdb) bt full
#0  0x00007ff349558bb7 in epoll_wait (epfd=5, events=0x555d755385d8, maxevents=100, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000555d733c500e in pollable_epoll (p=0x555d75538530, deadline=9223372036854775807) at /home/townsend/multipass/multipass/3rd-party/grpc/src/core/lib/iomgr/ev_epollex_linux.cc:1004
        timeout = -1
        r = 32766
#2  0x0000555d733c5aaf in pollset_work (pollset=0x555d756dfd60, worker_hdl=0x0, deadline=9223372036854775807)
    at /home/townsend/multipass/multipass/3rd-party/grpc/src/core/lib/iomgr/ev_epollex_linux.cc:1192
        worker = {kicked = false, initialized_cv = false, originator = 20552, cv = pthread_cond_t = {Threads known to still execute a wait function = 2731, Clock ID = CLOCK_REALTIME, Shared = Yes}, 
          pollset = 0x555d756dfd60, pollable_obj = 0x555d75538530, links = {{next = 0x7ffe1c317a90, prev = 0x7ffe1c317a90}, {next = 0x7ffe1c317a90, prev = 0x7ffe1c317a90}}}
        err_desc = 0x555d7375d379 "pollset_work"
        error = 0x0
#3  0x0000555d7336808a in pollset_work (pollset=0x555d756dfd60, worker=0x0, deadline=9223372036854775807) at /home/townsend/multipass/multipass/3rd-party/grpc/src/core/lib/iomgr/ev_posix.cc:317
        err = 0x555d756dfce0
#4  0x0000555d732fa811 in grpc_pollset_work (pollset=0x555d756dfd60, worker=0x0, deadline=9223372036854775807) at /home/townsend/multipass/multipass/3rd-party/grpc/src/core/lib/iomgr/pollset.cc:48
No locals.
#5  0x0000555d73318635 in cq_next (cq=0x555d756dfc50, deadline=..., reserved=0x0) at /home/townsend/multipass/multipass/3rd-party/grpc/src/core/lib/surface/completion_queue.cc:1033
        iteration_deadline = 9223372036854775807
        c = 0x0
        err = 0x0
        ret = {type = (unknown: 473005104), success = 32766, tag = 0x0}
        cqd = 0x555d756dfce0
        deadline_millis = 9223372036854775807
        is_finished_arg = {last_seen_things_queued_ever = 0, cq = 0x555d756dfc50, deadline = 9223372036854775807, stolen_completion = 0x0, tag = 0x0, first_loop = true}
        exec_ctx = {<grpc_core::ExecCtx> = {_vptr.ExecCtx = 0x555d73bfc9a0 <vtable for ExecCtxNext+16>, closure_list_ = {head = 0x0, tail = 0x0}, combiner_data_ = {active_combiner = 0x0, 
              last_combiner = 0x0}, flags_ = 0, starting_cpu_ = 3, now_is_valid_ = false, now_ = 0, static exec_ctx_ = {value = 140729371425840}, last_exec_ctx_ = 0x0}, 
          check_ready_to_finish_arg_ = 0x7ffe1c317c00}
#6  0x0000555d73318adb in grpc_completion_queue_next (cq=0x555d756dfc50, deadline=..., reserved=0x0) at /home/townsend/multipass/multipass/3rd-party/grpc/src/core/lib/surface/completion_queue.cc:1108
No locals.
#7  0x0000555d732ea954 in grpc::CompletionQueue::AsyncNextInternal (this=0x7ffe1c317e00, tag=0x7ffe1c317dd8, ok=0x7ffe1c317dcb, deadline=...)
    at /home/townsend/multipass/multipass/3rd-party/grpc/src/cpp/common/completion_queue_cc.cc:56
        ev = {type = (unknown: 473005472), success = 32766, tag = 0x7ffe1c317d60}
#8  0x0000555d732ded3d in grpc::CompletionQueue::AsyncNext<gpr_timespec> (this=0x7ffe1c317e00, tag=0x7ffe1c317dd8, ok=0x7ffe1c317dcb, deadline=...)
    at /home/townsend/multipass/multipass/3rd-party/grpc/include/grpcpp/impl/codegen/completion_queue.h:190
        deadline_tp = {time_ = {tv_sec = 9223372036854775807, tv_nsec = 0, clock_type = GPR_CLOCK_MONOTONIC}}
#9  0x0000555d732dba82 in grpc::Server::ShutdownInternal (this=0x555d75539450, deadline=...) at /home/townsend/multipass/multipass/3rd-party/grpc/src/cpp/server/server_cc.cc:629
        shutdown_cq = {<grpc::GrpcLibraryCodegen> = {_vptr.GrpcLibraryCodegen = 0x555d73bf32a0 <vtable for grpc::CompletionQueue+16>, grpc_init_called_ = true}, cq_ = 0x555d756dfc50, 
          avalanches_in_flight_ = 0}
        ok = 115
        status = (grpc::CompletionQueue::GOT_EVENT | unknown: 21852)
        shutdown_tag = {<grpc::internal::CompletionQueueTag> = {_vptr.CompletionQueueTag = 0x555d73bdc558 <vtable for grpc::(anonymous namespace)::ShutdownTag+16>}, <No data fields>}
        tag = 0x4f56989ef3e38b00
        lock = {_M_device = 0x555d75539490, _M_owns = true}
#10 0x0000555d732d1a7e in grpc::ServerInterface::Shutdown (this=0x555d75539450) at /home/townsend/multipass/multipass/3rd-party/grpc/include/grpcpp/impl/codegen/server_interface.h:95
No locals.
#11 0x0000555d732daaae in grpc::Server::~Server (this=0x555d75539450, __in_chrg=<optimized out>) at /home/townsend/multipass/multipass/3rd-party/grpc/src/cpp/server/server_cc.cc:452
        lock = {_M_device = 0x555d75539490, _M_owns = false}
        lock = <optimized out>
        it = <optimized out>
#12 0x0000555d732dac34 in grpc::Server::~Server (this=0x555d75539450, __in_chrg=<optimized out>) at /home/townsend/multipass/multipass/3rd-party/grpc/src/cpp/server/server_cc.cc:462
        lock = <optimized out>
        it = <optimized out>
#13 0x0000555d7313ba06 in std::default_delete<grpc::Server>::operator() (this=0x7ffe1c318288, __ptr=0x555d75539450) at /usr/include/c++/7/bits/unique_ptr.h:78
No locals.
#14 0x0000555d7313b3b1 in std::unique_ptr<grpc::Server, std::default_delete<grpc::Server> >::~unique_ptr (this=0x7ffe1c318288, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:268
        __ptr = @0x7ffe1c318288: 0x555d75539450
        __ptr = <optimized out>
#15 0x0000555d731389cc in multipass::DaemonRpc::~DaemonRpc (this=0x7ffe1c318230, __in_chrg=<optimized out>) at /home/townsend/multipass/multipass/src/daemon/daemon_rpc.h:42
No locals.
#16 0x0000555d730e1089 in multipass::Daemon::~Daemon (this=0x7ffe1c3180c0, __in_chrg=<optimized out>) at /home/townsend/multipass/multipass/src/daemon/daemon.cpp:745
No locals.
#17 0x0000555d730b996b in main (argc=5, argv=0x7ffe1c318578) at /home/townsend/multipass/multipass/src/daemon/daemon_main.cpp:123
        app = <incomplete type>
        builder = {url_downloader = std::unique_ptr<multipass::URLDownloader> = {get() = 0x0}, factory = std::unique_ptr<multipass::VirtualMachineFactory> = {get() = 0x0}, 
          image_hosts = std::vector of length 0, capacity 0, vault = std::unique_ptr<multipass::VMImageVault> = {get() = 0x0}, name_generator = std::unique_ptr<multipass::NameGenerator> = {get() = 0x0}, 
          ssh_key_provider = std::unique_ptr<multipass::SSHKeyProvider> = {get() = 0x0}, cert_provider = std::unique_ptr<multipass::CertProvider> = {get() = 0x0}, 
          client_cert_store = std::unique_ptr<multipass::CertStore> = {get() = 0x0}, update_prompt = std::unique_ptr<multipass::UpdatePrompt> = {get() = 0x0}, 
          logger = std::unique_ptr<multipass::logging::Logger> = {get() = 0x0}, cache_directory = {static null = {<No data fields>}, d = 0x555d7550d390}, data_directory = {
            static null = {<No data fields>}, d = 0x555d7550f7a0}, server_address = "unix:/run/multipass_socket", ssh_username = "ubuntu", days_to_expire = {__r = 14}, image_refresh_timer = {__r = 6}, 
          verbosity_level = multipass::logging::Level::debug, connection_type = multipass::RpcConnectionType::ssl}
        config = std::unique_ptr<const multipass::DaemonConfig> = {get() = 0x0}
        server_address = "unix:/run/multipass_socket"
        daemon = {<QObject> = {<No data fields>}, <multipass::VMStatusMonitor> = {_vptr.VMStatusMonitor = 0x555d73bef390 <vtable for multipass::Daemon+320>}, static staticMetaObject = {d = {
              superdata = 0x7ff34a8a2a00 <QObject::staticMetaObject>, stringdata = 0x555d736dd4c0 <qt_meta_stringdata_multipass__Daemon>, data = 0x555d736dde00 <qt_meta_data_multipass__Daemon>, 
              static_metacall = 0x555d7316c03c <multipass::Daemon::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, 
          config = std::unique_ptr<const multipass::DaemonConfig> = {get() = 0x555d75520a00}, vm_instance_specs = std::unordered_map with 2 elements = {["test-core"] = {num_cores = 1, mem_size = {
                bytes = 1073741824}, disk_space = {bytes = 5368709120}, mac_addr = "52:54:00:2e:06:63", ssh_username = "ubuntu", state = multipass::VirtualMachine::State::off, 
              mounts = std::unordered_map with 0 elements, deleted = false, metadata = {d = 0x555d754d6360, o = 0x555d755288dc}}, ["test-disco"] = {num_cores = 1, mem_size = {bytes = 1073741824}, 
              disk_space = {bytes = 5368709120}, mac_addr = "52:54:00:7f:d9:58", ssh_username = "ubuntu", state = multipass::VirtualMachine::State::off, mounts = std::unordered_map with 0 elements, 
              deleted = false, metadata = {d = 0x555d754d6360, o = 0x555d755289f4}}}, vm_instances = std::unordered_map with 2 elements = {
            ["test-disco"] = std::shared_ptr<multipass::VirtualMachine> (use count 1, weak count 0) = {get() = 0x555d7553f0f0}, 
            ["test-core"] = std::shared_ptr<multipass::VirtualMachine> (use count 1, weak count 0) = {get() = 0x555d7550e400}}, deleted_instances = std::unordered_map with 0 elements, 
          delayed_shutdown_instances = std::unordered_map with 0 elements, allocated_mac_addrs = std::unordered_set with 3 elements = {[0] = "52:54:00:a6:ae:dc", [1] = "52:54:00:7f:d9:58", 
            [2] = "52:54:00:2e:06:63"}, remote_image_host_map = std::unordered_map with 4 elements = {["daily"] = 0x555d75521b60, ["release"] = 0x555d75521b60, [""] = 0x555d7551ff80, 
            ["snapcraft"] = 0x555d7551ff80}, daemon_rpc = {<QObject> = {<No data fields>}, <multipass::Rpc::Service> = {<No data fields>}, static staticMetaObject = {d = {
                superdata = 0x7ff34a8a2a00 <QObject::staticMetaObject>, stringdata = 0x555d736de1a0 <qt_meta_stringdata_multipass__DaemonRpc>, data = 0x555d736deb20 <qt_meta_data_multipass__DaemonRpc>, 
                static_metacall = 0x555d7316c61c <multipass::DaemonRpc::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, 
            server_address = "unix:/run/multipass_socket", server = std::unique_ptr<grpc::Server> = {get() = 0x555d75539450}}, source_images_maintenance_task = <incomplete type>, metrics_provider = {
            metrics_url = {d = 0x555d7553b050}, unique_id = {static null = {<No data fields>}, d = 0x555d7553c0a0}, data_path = {static null = {<No data fields>}, d = 0x555d7550f7a0}, metric_batches = {
              d = 0x555d754df610, a = 0x555d75506dd8}, metrics_mutex = {<std::__mutex_base> = {_M_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, 
                  Protocol = None}}, <No data fields>}, metrics_cv = {_M_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}}, 
            running = false, metrics_available = false, metrics_sender = {thread = {_M_id = {_M_thread = 0}}}}, metrics_opt_in = {opt_in_status = multipass::OptInStatus_Status_DENIED, 
            delay_opt_in_count = 3}, instance_mounts = warning: RTTI symbol not found for class 'QObject'
{<QObject> = {<No data fields>}, static staticMetaObject = {d = {superdata = 0x7ff34a8a2a00 <QObject::staticMetaObject>, 
                stringdata = 0x555d736f35e0 <qt_meta_stringdata_multipass__SSHFSMounts>, data = 0x555d736f3620 <qt_meta_data_multipass__SSHFSMounts>, 
                static_metacall = 0x555d73194ee6 <multipass::SSHFSMounts::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)>, relatedMetaObjects = 0x0, extradata = 0x0}}, 
            key = "01\202I\363\177\000\000\060\061\202I\363\177\000\000\240XTu]U\000\000\240XTu]U\000\000MIIEowIBAAKCAQEAmO3dp+nZ7e36w7h91R9AMWXZAPnVsy2kLlLuaiQa7hdvtJRw\nWxGDuh1tNE4LgxdOoHbSJBFBap+2Rzz2LJ1cRBqcj/xqfUaAIs+fPyZXskxfEw//\nfO0IDjnD9vXI4AudtIViYWIU80+1F7sYZvtSB/"..., mount_processes = std::unordered_map with 0 elements}, async_future_watchers = std::vector of length 0, capacity 1, 
          async_running_futures = std::unordered_map with 0 elements, start_mutex = {<std::__mutex_base> = {_M_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, 
                Protocol = None}}, <No data fields>}, preparing_instances = std::unordered_set with 0 elements, image_update_future = {d = <incomplete type>}}
        ret = 0
        handler = {signal_handling_thread = {thread = {_M_id = {_M_thread = 140682694383360}}}}

townsend2010 avatar Oct 17 '19 20:10 townsend2010

Still relevant. This would require to track all active RPC contexts and introduce logic into the blocking sections (like wait for cloud init or ssh during launch) and exit early if the RPC requests are cancelled. Additionally, since the daemon rarely exits without signals, it could require to implement signal handlers for SIGINT and SIGKILL at least.

tobe2098 avatar Nov 19 '25 09:11 tobe2098

Hey @tobe2098, just to clarify the signals, we do have custom handling for SIGINT (SIGKILL can't be handled).

ricab avatar Nov 19 '25 13:11 ricab