valheim-docker icon indicating copy to clipboard operation
valheim-docker copied to clipboard

[Bug] shutdown.rs blocks infinitely if process can't be stopped

Open anonymus19941 opened this issue 3 years ago • 8 comments

I accidentially ran the valheim server as root and didn't realize it. The auto update script, run by cron as user steam, tries to stop the server with odin stop. This calls the function blocking_shutdown in shutdown.rs (I think). This function tries to send an interrupt signal to the server process and then blocks until it doesn't exist anymore. In my case, odin run as steam user couldn't stop the server run as root, so it blocked infinitely and created new processes each time the cron job ran.

This shouldn't happen normaly, as the server usually doesn't run as root, but it's still not really nice and should be fixed, in my opinion.

anonymus19941 avatar Apr 23 '21 18:04 anonymus19941

Ohh good find! I will take a look into this, Ill see about adding a timeout or looking to see if I can get the processes user to throw an error situation on.

mbround18 avatar Apr 23 '21 21:04 mbround18

Alright, I think I understand how I want to tackle this, from an acceptance criteria:

  • Any command launched by odin should error if run as root. Unless an override flag is thrown.
  • In the wait for shutdown, I will change it up to use result<process, error> to pass between the two functions.
  • In the waiting function, it will wait for a maximum of 5 minutes on a timeout before throwing an error.

mbround18 avatar Apr 27 '21 17:04 mbround18

I have a temp fix for this I will publish image

mbround18 avatar May 22 '21 07:05 mbround18

The error should be available now :) been a bit busy with work/school to tackle the underlying issue and will leave this ticket open but it will halt execution if you are running as root.

mbround18 avatar May 22 '21 21:05 mbround18

Kinda forgot about this ... Thanks for the fix. This is definitively helpful.

To the underlying issue: I'm not sure if this is helpful, because I don't know how you would do this, but I tried to fix it (or at least I have started to) according to your message (I don't really know which process you'd want to return, so I left it empty for now). Feel free to use or extend or discard it (I didn't want to create a PR, so here's a patch instead).

Index: src/odin/server/update.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/odin/server/update.rs b/src/odin/server/update.rs
--- a/src/odin/server/update.rs	(revision 22ac603b10ef3da8dd025c8d85279cbcd90ff061)
+++ b/src/odin/server/update.rs	(date 1621729044090)
@@ -65,7 +65,10 @@
   // Shutdown the server if it's running
   let server_was_running = server::is_running();
   if server_was_running {
-    server::blocking_shutdown();
+    if let Err(e) = server::blocking_shutdown() {
+      error!("Failed to stop server: {}", e);
+      exit(1);
+    }
   }
 
   // Update the installation
Index: src/odin/server/shutdown.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/odin/server/shutdown.rs b/src/odin/server/shutdown.rs
--- a/src/odin/server/shutdown.rs	(revision 22ac603b10ef3da8dd025c8d85279cbcd90ff061)
+++ b/src/odin/server/shutdown.rs	(date 1621729085757)
@@ -4,10 +4,12 @@
 use std::{thread, time::Duration};
 
 use crate::constants;
+use crate::errors::TimeoutError;
 
-pub fn blocking_shutdown() {
+pub fn blocking_shutdown() -> Result<(), TimeoutError> {
   send_shutdown_signal();
-  wait_for_exit();
+
+  wait_for_exit()
 }
 
 pub fn send_shutdown_signal() {
@@ -32,9 +34,10 @@
   }
 }
 
-fn wait_for_exit() {
+fn wait_for_exit() -> Result<(), TimeoutError> {
   info!("Waiting for server to completely shutdown...");
   let mut system = System::new();
+  let mut waiting_for_seconds = 0;
   loop {
     system.refresh_all();
     let processes = system.get_process_by_name(constants::VALHEIM_EXECUTABLE_NAME);
@@ -43,7 +46,14 @@
     } else {
       // Delay to keep down CPU usage
       thread::sleep(Duration::from_secs(1));
+
+      waiting_for_seconds += 1;
+      if waiting_for_seconds > 300 {
+        return Err(TimeoutError {});
+      }
     }
   }
-  info!("Server has been shutdown successfully!")
+  info!("Server has been shutdown successfully!");
+
+  Ok(())
 }
Index: src/odin/commands/stop.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/odin/commands/stop.rs b/src/odin/commands/stop.rs
--- a/src/odin/commands/stop.rs	(revision 22ac603b10ef3da8dd025c8d85279cbcd90ff061)
+++ b/src/odin/commands/stop.rs	(date 1621729044103)
@@ -15,6 +15,9 @@
       error!("Failed to find server executable!");
       exit(1);
     }
-    server::blocking_shutdown();
+    if let Err(e) = server::blocking_shutdown() {
+      error!("Failed to stop server: {}", e);
+      exit(1);
+    }
   }
 }
Index: src/odin/errors/mod.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/odin/errors/mod.rs b/src/odin/errors/mod.rs
--- a/src/odin/errors/mod.rs	(revision 22ac603b10ef3da8dd025c8d85279cbcd90ff061)
+++ b/src/odin/errors/mod.rs	(date 1621729044109)
@@ -13,3 +13,14 @@
     write!(f, "VariantNotFound: {}", &self.v)
   }
 }
+
+#[derive(Debug)]
+pub struct TimeoutError {}
+
+impl error::Error for TimeoutError {}
+
+impl Display for TimeoutError {
+  fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+    write!(f, "Timeout")
+  }
+}

anonymus19941 avatar May 23 '21 00:05 anonymus19941

Hey @anonymus19941 thanks for your work on this, I am looking into some shutdown work atm and trying to find a good way to handle it. My goal is to survey the running programs -> pass to a function to process them and check permissions -> if has permissions then shut down the process and ensure its shutdown with a timeout trap similar to how you have outlined :)

mbround18 avatar Jun 14 '21 07:06 mbround18

Is this still an active issue?

fclante avatar Feb 26 '24 13:02 fclante

@fclante not really since its forced to run as steam in the dockerfile additionally there have been many improvements since it was an issue. I think its safe to close but i probably should add a root check to the binary just in case its ran elsewhere.

mbround18 avatar Apr 21 '24 04:04 mbround18