chia-plotter icon indicating copy to clipboard operation
chia-plotter copied to clipboard

dual socket configuration, only using 1 CPU not both

Open wtfiwinomgs opened this issue 3 years ago • 14 comments

as title suggested, using madmax plotting only uses 1 CPU in a duo socket server, how to take advantage of both CPUs at the same time? in window server 2012 r2

using overthread does not work. (ie 12 cores 24 threads, two cpu means 48 threads, -r 48 still only uses 1 CPU)

wtfiwinomgs avatar Jul 07 '21 08:07 wtfiwinomgs

No issues here....using both of my Xeons and has been for weeks.

Mar-85 avatar Jul 07 '21 09:07 Mar-85

No issues here....using both of my Xeons and has been for weeks.

can you show me your CLI command line minus the keys? also are you in linux or windows?

wtfiwinomgs avatar Jul 07 '21 09:07 wtfiwinomgs

can you show me your CLI command line minus the keys? also are you in linux or windows?

Yep no worries, im in Linux and use the following: ./chia_plot -n 55 -r 35 -u 256 -t /mnt/8drive/ -2 /mnt/ram/ -p XXXX -f XXXX

Mar-85 avatar Jul 07 '21 09:07 Mar-85

can you show me your CLI command line minus the keys? also are you in linux or windows?

Yep no worries, im in Linux and use the following: ./chia_plot -n 55 -r 35 -u 256 -t /mnt/8drive/ -2 /mnt/ram/ -p XXXX -f XXXX

thanks for that, few questions

n55 that is queued up right? I'd like to think so because no one would have enough ram unless 4TB+. if above is true, how do you queue in parallel, let say you do have enough ram for two or three at the same time with mad max. which folder is the final directory? tmp1 8drive or tmp2 ramdisk? also, if you want final directory to not be tmp1 or 2, what command to add to this cli?

wtfiwinomgs avatar Jul 07 '21 09:07 wtfiwinomgs

can you show me your CLI command line minus the keys? also are you in linux or windows?

Yep no worries, im in Linux and use the following: ./chia_plot -n 55 -r 35 -u 256 -t /mnt/8drive/ -2 /mnt/ram/ -p XXXX -f XXXX

thanks for that, few questions

n55 that is queued up right? I'd like to think so because no one would have enough ram unless 4TB+. if above is true, how do you queue in parallel, let say you do have enough ram for two or three at the same time with mad max. which folder is the final directory? tmp1 8drive or tmp2 ramdisk? also, if you want final directory to not be tmp1 or 2, what command to add to this cli?

No point doing 2 in parallel - it doesn't speed anything up because the time per plot will double so it results in the same overall

SebMoore avatar Jul 07 '21 09:07 SebMoore

No point doing 2 in parallel - it doesn't speed anything up because the time per plot will double so it results in the same overall

That's not true, especially when talking about dual socket systems. Processes crossing the numa nodes will suffer a small performance penalty, so running 1 job per numa node will be slightly faster than single plotting. My dual CPU system can get 25 minute plots on RAM, 30 minute plots on NVMe, and 37-44 minute plots when running 2 in parallel on NVMe divided by numactl.

@wtfiwinomgs have you run other processes that utilize all threads? Do you have CPU affinity set in task manager? Is NUMA disabled in BIOS?

gryan315 avatar Jul 07 '21 09:07 gryan315

No point doing 2 in parallel - it doesn't speed anything up because the time per plot will double so it results in the same overall

That's not true, especially when talking about dual socket systems. Processes crossing the numa nodes will suffer a small performance penalty, so running 1 job per numa node will be slightly faster than single plotting. My dual CPU system can get 25 minute plots on RAM, 30 minute plots on NVMe, and 37-44 minute plots when running 2 in parallel on NVMe divided by numactl.

@wtfiwinomgs have you run other processes that utilize all threads? Do you have CPU affinity set in task manager? Is NUMA disabled in BIOS?

Ok, yep - you're totally correct - forgot we were talking about a dual socket system here. That's how I solved my problem as well - on my dual Xeon system, I couldn't get the CPU usage per core to go above about 60%, even with everything on a ramdisk. Now I just run two plots in parallel (using the threads given by numactl --hardware) and yep, 90%-ish utilisation. Problem solved. Lucky I have 512GB ram to spare though.

SebMoore avatar Jul 07 '21 09:07 SebMoore

No point doing 2 in parallel - it doesn't speed anything up because the time per plot will double so it results in the same overall

That's not true, especially when talking about dual socket systems. Processes crossing the numa nodes will suffer a small performance penalty, so running 1 job per numa node will be slightly faster than single plotting. My dual CPU system can get 25 minute plots on RAM, 30 minute plots on NVMe, and 37-44 minute plots when running 2 in parallel on NVMe divided by numactl.

@wtfiwinomgs have you run other processes that utilize all threads? Do you have CPU affinity set in task manager? Is NUMA disabled in BIOS?

thanks for answering. NUMA is enabled I think, but when I run CLI command in windows, it shows it only run on 1 processor so effectively NUMA is not being used? I mean with task manager I can assign chia_plot.exe to node 0 or 1. further more, 768GB of ram is split between both CPU so even with NUMA disabled, it still needs to grab ram data from the other socket..

my question though, is how to use both CPU all 48 threads on the same plot? (24 cores, 2x 12 cores CPU). or would doing two plots, 1 on each CPU be faster?

what do you suggest?

wtfiwinomgs avatar Jul 07 '21 10:07 wtfiwinomgs

Is that 768gb DRAM or PMEM? If you don't need to run any windows-only apps, I recommend moving to Linux, using a LTS kernel if you intend to use the system as a 24/7 farmer, or an updated kernel for better plotting performance. If you've got 768gb of DRAM, you could create either a large tmpfs, or 2 separate ramdisks using the brd kernel module, then run two instances of mad max via numactl to bind each instance to one socket's CPU and RAM. I believe with brd devs, numactl will successfully isolate them to a single socket, but not sure how tmpfs would respond, it's worth testing.

gryan315 avatar Jul 07 '21 11:07 gryan315

Is that 768gb DRAM or PMEM? If you don't need to run any windows-only apps, I recommend moving to Linux, using a LTS kernel if you intend to use the system as a 24/7 farmer, or an updated kernel for better plotting performance. If you've got 768gb of DRAM, you could create either a large tmpfs, or 2 separate ramdisks using the brd kernel module, then run two instances of mad max via numactl to bind each instance to one socket's CPU and RAM. I believe with brd devs, numactl will successfully isolate them to a single socket, but not sure how tmpfs would respond, it's worth testing.

sorry I am kind of new at this, the 768gb are DRAM not PMEM. ddr4 with v3 CPUs. it takes us about 1.25 to 2 hours plotting two at the same time, I fail to see how people are able to do a plot in less than 30 mins unless they got a lot more core at higher frequency, or is it the kernal difference of windows vs linux.

wtfiwinomgs avatar Jul 07 '21 15:07 wtfiwinomgs

There is a pretty significant difference between windows and Linux. I haven't personally tested plotting, but every other heavy computational task that I've done on this system was between 10-20% faster on Linux than it was on windows. Someone on reddit also brought up that perhaps the software they were using to make the ramdisk was slower than other options. Since tmpfs is built right into the kernel and most Linux distributions use it for caching, it has pretty strong development and excellent performance. Linux also has numactl and the xfs filesystem, which I've seen a significant performance jump over other filesystems on both NVMe and spinning disks. There's also the fact that mad max is written for Linux, and ported to windows by stotiks. Try as he might, he will usually be a little behind the latest releases, and it's unknown how the differences in the OS affect the performance as written.

gryan315 avatar Jul 07 '21 17:07 gryan315

There is a pretty significant difference between windows and Linux. I haven't personally tested plotting, but every other heavy computational task that I've done on this system was between 10-20% faster on Linux than it was on windows. Someone on reddit also brought up that perhaps the software they were using to make the ramdisk was slower than other options. Since tmpfs is built right into the kernel and most Linux distributions use it for caching, it has pretty strong development and excellent performance. Linux also has numactl and the xfs filesystem, which I've seen a significant performance jump over other filesystems on both NVMe and spinning disks. There's also the fact that mad max is written for Linux, and ported to windows by stotiks. Try as he might, he will usually be a little behind the latest releases, and it's unknown how the differences in the OS affect the performance as written.

damn that sounds amazing. I've seen people with ivy-xeons v2 with ddr3 out perform my system v3 ddr4 by like 5x the speed. I'd assume its linux vs windows then

wtfiwinomgs avatar Jul 07 '21 19:07 wtfiwinomgs

There's even a pretty significant difference between kernel versions, generally the newer a kernel version is, the better it performs (not always the case, but often). That's why for my dedicated farming PC I use ubuntu LTS for ease of software compatibility and not often needing to restart after updates, and on my plotting system I use Arch which has frequent kernel updates and is constantly pushing performance up. I probably should have gone with RHEL, since my farmer is an HP and a lot of their enterprise software/drivers are made specifically for RHEL, but I'm more familiar with debian and arch, and I've been managing to get by so far.

gryan315 avatar Jul 07 '21 20:07 gryan315

Use Centos8

farshidbahmani avatar Jul 26 '21 13:07 farshidbahmani