plotman icon indicating copy to clipboard operation
plotman copied to clipboard

[Enhancement] Adding TRIM support for x amount of jobs ran

Open michael-pptf opened this issue 4 years ago • 6 comments

Hello,

As I am plotting away happily with Plotman, I did noticed a serious performance degradation of my IO. Disk IO would drop to the 150 MB/s range, which is indicative of the SSD needing to read-write-read the cells to fit in new temporary plots.

After I discovered this behavior, I stopped all plotting processes immediately (probably Plotman can add a "kill all" command), and issued sudo fstrim -a -v. This operation effectively removes old temporary plot residues and releases the cache. Now my SSD is back to normal, but I do expect it will get filled up again, even if the temporary plots are finalized and moved to the destination folder.

What I am purposing here is that Plotman can add fstrim to the existing framework. Since Plotman already has a phase monitoring tool built-in, it would be hell of a lot easier to add a "TRIM after x jobs completed". For a 1TB SSD, this could be set to 3, meaning for every 3 jobs completed, fstrim will run once.

More about fstrim here: https://man7.org/linux/man-pages/man8/fstrim.8.html#:~:text=fstrim%20is%20used%20on%20a,unused%20blocks%20in%20the%20filesystem.

michael-pptf avatar Apr 25 '21 22:04 michael-pptf

The general recommendation is to mount ssd's with the discard option. You can also use the fstrim timer service if you prefer periodic trims. There is some debate about optimal scheduling of trims, but I'm skeptical that Plotman is where trim scheduling should happen. Let me try to loop in an expert.

ericaltendorf avatar Apr 26 '21 00:04 ericaltendorf

It is very simple: Chia writes a lot of data - an amount that no sane person will ever write consistently for days on end. Mount SSD with DISCARD flag is great and all, but it definitely did not foresee there will be such a day where a group of Linux operators consistently writing hundreds of TBs of data per day into their little NVMe SSDs.

If I plot 7 jobs in parallel, a 2TB NVMe SSD gets filled in a bit over 6 hours. In that sense, every page of the SSD will have been written on, and there will be nothing left for the next job. I mean - if Plotman can do it, that'd be great. If not, this needs to be done for everyone that's plotting out there anyway if they want their temp drive to last a little bit longer.

michael-pptf avatar Apr 26 '21 02:04 michael-pptf

I understand lots of data is being written. My understanding is that discard addresses this. See e.g. https://wiki.archlinux.org/index.php/Solid_state_drive#Continuous_TRIM and the "enabling TRIM" section of https://chiadecentral.com/chia-blockchain-ssd-buying-guide/

I'm happy to be proven wrong, but this is not just a Plotman thing, if more frequent trim's were helpful, that would apply to all chia plotting, and in the discussions of trim and discard I've been a part of over the past year I haven't ever heard anyone recommend more frequent manual trim'ing. If it were helpful to trim more often than 'discard' does, I'd like to see some additional evidence of that.

Check with storage_jm@ on keybase; he's the one that wrote the chia decentral guide and has a fair bit of knowledge of the various trim implementations.

On Sun, Apr 25, 2021 at 7:43 PM Michael Huang @.***> wrote:

It is very simple: Chia writes a lot of data - an amount that no sane person will ever write consistently for days on end. Mount SSD with DISCARD flag is great and all, but it definitely did not foresee there will be such a day where a group of Linux operators consistently writing hundreds of TBs of data per day into their little NVMe SSDs.

If I plot 7 jobs in parallel, a 2TB NVMe SSD gets filled in a bit over 6 hours. In that sense, every page of the SSD will have been written on, and there will be nothing left for the next job. I mean - if Plotman can do it, that'd be great. If not, this needs to be done for everyone that's plotting out there anyway if they want their temp drive to last a little bit longer.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ericaltendorf/plotman/issues/167#issuecomment-826463537, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARPZIFXYTF7PEIKRBRIFMMTTKTHNFANCNFSM43RXR4XQ .

ericaltendorf avatar Apr 26 '21 02:04 ericaltendorf

Would the option of configurable callbacks/hooks at different stages allow for something like this in a general sense? If so we could close in favour of https://github.com/ericaltendorf/plotman/issues/40

BasilHorowt avatar May 09 '21 06:05 BasilHorowt

plotman could read the fstab file and see if discard has been added as an option.

If discard was not added, I agree with the OP that running fstrim every 3 plots per 1 tera SSD is a sensible option for PlotMan to do in order to keep peak performance of the SSD drives.

image

Note that the default for the fstrim timer is once a week. That's about 14x too late if you plot 3 plots every 12 hours. Like OP said, it's not got Chia plotters in mind.

For the casual plotter, they won't know what trimming is and they won't have read about it before they mount their drive so they won't have added discard.

I can totally how adding fstrim to PlotMan would be a real gain for ppl.

blundell avatar Jun 04 '21 09:06 blundell

Automatically doing the wrong solution doesn't seem like a thing we need to add. I do imagine at some point adding a plotman config check command which could reasonably make commentary on tmp drive configuration such as this.

altendky avatar Jun 04 '21 13:06 altendky