go-criu
go-criu copied to clipboard
Restore shelljob in go-criu
I am trying to figure out, how to go about restoring shelljobs in go-criu. Inside my Go code, I'm launching a child process and dumping it. When I go to restore it, I get the following error every time.
restore failed: operation failed (msg:Error (criu/tty.c:991): tty: Don't have tty to inherit session from, aborting err:0)
As far as I understand it is trying to inherit the shell session that the Go program is running it, so it fails. Am I missing a simple way to do this? I am setting the shelljob paremeter on both dump and restore, and have also tried setting setsid on the child process, but that isn't doing anything. Thanks!
Are you using the -j/--shell-job flag while invoking CRIU? Also, could you please share the debug logs? You can generate them with the -v4 -o dump.log flags.
Yes, I have been trying different options, but essentially these options for dump and restore. I have attached the dump and restore logs.
opts := &rpc.CriuOpts{
Pid: proto.Int32(int32(pid)),
ImagesDirFd: proto.Int32(int32(img.Fd())),
LogLevel: proto.Int32(4),
ShellJob: proto.Bool(true),
LogToStderr: proto.Bool(true),
LeaveRunning: proto.Bool(true),
LogFile: proto.String("dump.log"),
}
opts := &rpc.CriuOpts{
ImagesDirFd: proto.Int32(int32(img.Fd())),
LogLevel: proto.Int32(4),
ShellJob: proto.Bool(true),
LogFile: proto.String("restore.log"),
}
The dump works fine and I can restore it later on manually from the commandline, but the restore in Go is crashing with mentioned error. If it was confusing, I am trying to also restore it within the same Go program. I attached the parent and child code too. Thanks for the quick response! logs_zipped.zip
When you start a child process from within the Go code, it does not directly have access to the stdin/stdout/stderr of the TTY from which you are running the main program. You are explicitly setting the "current" stdout/stderr for the child program in the following lines:
// newparent.go:149
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
When CRIU is restoring the child process, these file descriptors are not recognised as a terminal/TTY since the child never had direct access to it in the first place (at least, I think this is how it works, my explanation might be wrong).
One way to make this work is by using a pipe to connect the stdin/stdout/stderr of the parent process to the child process.
// newparent.go:149
_, err := cmd.StdinPipe()
// Handle error
_, err = cmd.StdoutPipe()
// Handle error
_, err = cmd.StderrPipe()
// Handle error
Making this change locally allowed me to successfully restore the child from within the parent process, like how you're trying to do.
Not sure if this is related, but crun uses a callback to get an FD from the restored process from CRIU.
https://github.com/containers/crun/blob/dd52246b02b374330a6a747d57da9a8f326d7cba/src/libcrun/criu.c#L183
I do not think the go interface exposes this callback. I am also not sure it is related. But it reminded me a bit about this request here. Maybe it helps.
runc also implements something similar using the orphan-pts-master hook.
Thanks, It seems like it works. I suspected it was some Go technicality I didn't know or the --external option for CRIU, but the documentation isn't quite user-friendly. But I have another question about the output. So I ended up with a StartBinary function like this:
func startBinary(target ...string) (*exec.Cmd, bytes.Buffer) {
cmd := exec.Command(target[0], target[1:]...)
var stdBuffer bytes.Buffer
mw := io.MultiWriter(os.Stdout, &stdBuffer)
cmd.Stdout = mw
cmd.Stderr = mw
cmd.SysProcAttr = &syscall.SysProcAttr{
Ptrace: true,
Setsid: true,
}
if err := cmd.Start(); err != nil {
log.Fatal("Failed to start child process:", err)
}
go func() {
log.Println(stdBuffer.String())
}()
return cmd, stdBuffer
}
Like so, I get the output to the terminal as it comes, but it doesn't seem to continue after the restore. Is there some trick for this too?
EDIT: It looks like the process stays in state tsl after ATTACH and CONT, Tracerpid seems the same though.
Hi. I restructured the code a bit and I realize that the restored process, though CRIU says it restores successfully, never actually is restored. When SIGSTOP is sent to the process before dump, it revives and lives until a CONT or 2 are sent to it. When I don't send the SIGSTOP on the other hand. CRIU says it restores successfully, but instantly dies, though it is impossible that the process does it on its own. I am really at a loss here. I am quite sure it has to do with the pipes, but am running out of thoughts here. Do you have any ideas why this could be? problem.zip
I met the same problem. I created a simple looper process by running a C++ script. It can print numbers in the terminal. The dumping worked fine, but when I tried to restore the process, the same error occured. I can restore it manually by using the 'criu' commandline, but I can not do this in go-criu.
So I managed to work around this by starting a tty in Go and launching it from there manually. There is probably a better way with some external sockets or something, but I didn't have the capacity to find it then. ` import ("github.com/creack/pty")
cmd := exec.Command("/usr/local/sbin/criu", "restore", "-v4", "-o", "restore.log", "-j", "--tcp-established", "-D", checkpointDir)
f, err := pty.Start(cmd) `