plotng Resume work (pick up running chia plots create processes)

Hi. It would be really nice for plotng to resume from where it left off after a version upgrade or even bring into the UI if there are already any chia plots create processes running in the system, so as to not overstep the older ones resource-wise.

Can you please add resume support to plotng when using the same config file? Thanks :)

May 27 '21 19:05 carlosvsilva

I have a couple of ideas on how to do this one, not entirely sure it'll work but I'll give it a crack.

It would definitely be good to allow restarting the server process, as there's some data structure changes I want to make, but I'm very aware of the impact to plotting when the server process is bounced.

Jun 02 '21 13:06 squizzling

Beeb thinking about this. I don't think it is possible. If the PlotNG process is killed then all the processes it creates ie. all the chia plot create will also be killed. There's really no way around this.

Jun 02 '21 15:06 maded2

I think last time I killed plotng the processes kept running (probably with parent id = init/systemd). That's why I suggest to pick up on running processes as plotng starts, and sync state by reading their logs.

Jun 02 '21 17:06 carlosvsilva

The plotters keep running, but things get messed up because the server isn't reading logs anymore. My plan is to redirect the logs to a file, instead of having the server read/write the logs. It can then pick up logs directly from disk, and optionally resume.

Jun 03 '21 00:06 squizzling

However, if the PlotNG existed, then the File Descriptor to the running plot processes is lost. I also think the plot will hangs as well.

Jun 03 '21 01:06 maded2

That's because the plotter is writing to the plotng-servers pipes. The change will be to have the plotter write to disk, and then plotng-server reads from disk. There won't be a connection anymore.

Jun 03 '21 01:06 squizzling

but there's always a process which writes the output to a file. In the command line, it is the shell process. For PlotNG, it is the PlotNG inself. If the parent process dies, the File Descriptor to Stdout/Stderr will be lost.

Jun 03 '21 01:06 maded2

All the plotter knows is to write to its stdout (fd 1). When you run the plotter from the commandline with nothing else going on, the stdout will go to the shell which passes it to the terminal. When you run it like "chia plots > foo.log", then the shell will open foo.log, and set the fd 1 of the child process to the opened file. At that point, the shell isn't required anymore.

This will do the same, it's basically:

logFile, _ := os.Create("foo.log")
cmd.Stdout = logFile
cmd.Start()
_ = logFile.Close()
go watchLogFile("foo.log")

There's a bunch more to it of course, but that's the general theme.

Jun 03 '21 01:06 squizzling

Rather than looking at it from the perspective of ctrl-c or killing the process would it make more sense to look at it from a more graceful shutdown of the process?

A 'hotkey' press that would dump all the necessary data to a file that could be resumed later perhaps...

It wouldn't solve for a kill or anything abrupt but it would be a start.

Jun 04 '21 02:06 clmarshall

Saving and re-parsing log files is working so far, the primary downside is dealing with partial log files (ie, the plotter stopped midway through), and the ability to kill a process through the server (because you don't know the pid of a resumed log).

My current thoughts are to have "owned" (things we created) and "orphans" (things we resumed), and we can only kill things that are owned.

The state for orphans would be adopting -> waiting for data (with a timeout, maybe 15 minutes) -> abandoned / adopted, depending on whether anything is written to the log.

Jun 04 '21 02:06 squizzling

I think we shouldn't put too much effort to this as it will make the whole tool very complex. I think the current message to the user is that if you need to upgrade then do a full restart, it's not a big deal (most of the time PlotNG don't need to be upgraded. I personally have PlotNG server running for weeks).

If we make this tool too complex then there's so many ways where it doesn't work as expected.

I preferred us to spend time on making sure the staggering / delays works as expect #65

Jun 04 '21 02:06 maded2

I've largely avoided touching the server because I don't want to interrupt plotting. Making plotting resumable makes it easier to work on that, even with extra complexity.

Jun 04 '21 02:06 squizzling

@squizzling

Another thought... I remember using a feature of VirtualBox that would suspend the state of the VM.

Docker has a similar capability with Docker Pause

Perhaps one of these might work as a free solution until a native option presents itself?

Additionally, while looking up the docker command TIL it uses a kernel capability called cgroup freeze.

https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt

Edit: missed a word

Jun 04 '21 12:06 clmarshall

I've been considering how to do pausing, because it would enable the ability to have a strict limit on each phase. I know how to do it under Windows (at least in theory, I've never done it from Go), and while I was unaware of the cgroup method for Linux, that looks like a good starting point. I have zero ability to apply it to macos, though.

That aside, my main goal on resuming is to do rapid changes on the server. Developing the ability to freeze is hampered by the inability to restart the server without losing work.

Jun 04 '21 12:06 squizzling

In terms of pausing, it should be relative easy to do. I really think we should drop this thread on resume work. If we can correctly process config updates, there's almost no reason to stop the server.

Jun 04 '21 13:06 maded2

plotng plotng copied to clipboard

Resume work (pick up running chia plots create processes)

plotng
plotng copied to clipboard