plotng
plotng copied to clipboard
Resume work (pick up running chia plots create processes)
Hi. It would be really nice for plotng to resume from where it left off after a version upgrade or even bring into the UI if there are already any chia plots create processes running in the system, so as to not overstep the older ones resource-wise.
Can you please add resume support to plotng when using the same config file? Thanks :)
I have a couple of ideas on how to do this one, not entirely sure it'll work but I'll give it a crack.
It would definitely be good to allow restarting the server process, as there's some data structure changes I want to make, but I'm very aware of the impact to plotting when the server process is bounced.
Beeb thinking about this. I don't think it is possible. If the PlotNG process is killed then all the processes it creates ie. all the chia plot create will also be killed. There's really no way around this.
I think last time I killed plotng the processes kept running (probably with parent id = init/systemd). That's why I suggest to pick up on running processes as plotng starts, and sync state by reading their logs.
The plotters keep running, but things get messed up because the server isn't reading logs anymore. My plan is to redirect the logs to a file, instead of having the server read/write the logs. It can then pick up logs directly from disk, and optionally resume.
However, if the PlotNG existed, then the File Descriptor to the running plot processes is lost. I also think the plot will hangs as well.
That's because the plotter is writing to the plotng-servers pipes. The change will be to have the plotter write to disk, and then plotng-server reads from disk. There won't be a connection anymore.
but there's always a process which writes the output to a file. In the command line, it is the shell process. For PlotNG, it is the PlotNG inself. If the parent process dies, the File Descriptor to Stdout/Stderr will be lost.
All the plotter knows is to write to its stdout (fd 1). When you run the plotter from the commandline with nothing else going on, the stdout will go to the shell which passes it to the terminal. When you run it like "chia plots > foo.log", then the shell will open foo.log, and set the fd 1 of the child process to the opened file. At that point, the shell isn't required anymore.
This will do the same, it's basically:
logFile, _ := os.Create("foo.log")
cmd.Stdout = logFile
cmd.Start()
_ = logFile.Close()
go watchLogFile("foo.log")
There's a bunch more to it of course, but that's the general theme.
Rather than looking at it from the perspective of ctrl-c or killing the process would it make more sense to look at it from a more graceful shutdown of the process?
A 'hotkey' press that would dump all the necessary data to a file that could be resumed later perhaps...
It wouldn't solve for a kill or anything abrupt but it would be a start.
Saving and re-parsing log files is working so far, the primary downside is dealing with partial log files (ie, the plotter stopped midway through), and the ability to kill a process through the server (because you don't know the pid of a resumed log).
My current thoughts are to have "owned" (things we created) and "orphans" (things we resumed), and we can only kill things that are owned.
The state for orphans would be adopting -> waiting for data (with a timeout, maybe 15 minutes) -> abandoned / adopted, depending on whether anything is written to the log.
I think we shouldn't put too much effort to this as it will make the whole tool very complex. I think the current message to the user is that if you need to upgrade then do a full restart, it's not a big deal (most of the time PlotNG don't need to be upgraded. I personally have PlotNG server running for weeks).
If we make this tool too complex then there's so many ways where it doesn't work as expected.
I preferred us to spend time on making sure the staggering / delays works as expect #65
I've largely avoided touching the server because I don't want to interrupt plotting. Making plotting resumable makes it easier to work on that, even with extra complexity.
@squizzling
Another thought... I remember using a feature of VirtualBox that would suspend the state of the VM.
Docker has a similar capability with Docker Pause
Perhaps one of these might work as a free solution until a native option presents itself?
Additionally, while looking up the docker command TIL it uses a kernel capability called cgroup freeze.
https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt
Edit: missed a word
I've been considering how to do pausing, because it would enable the ability to have a strict limit on each phase. I know how to do it under Windows (at least in theory, I've never done it from Go), and while I was unaware of the cgroup method for Linux, that looks like a good starting point. I have zero ability to apply it to macos, though.
That aside, my main goal on resuming is to do rapid changes on the server. Developing the ability to freeze is hampered by the inability to restart the server without losing work.
In terms of pausing, it should be relative easy to do. I really think we should drop this thread on resume work. If we can correctly process config updates, there's almost no reason to stop the server.