Parallel processing in Windows 11 and latest R version
Dear Jean-Baptiste Féret,
The new version of the package works perfectly for me in Windows 10 and R 4.2.2 with parallel processing:
maxRows <- 10000 # nb of lines processed at once (adjust based on RAM available)
total_cores <- detectCores() nbCPU <- total_cores - 1 # nb of threads for parallel processing
5- apply biodivMapR
divmetrics <- biodivMapR_full( input_raster_path = trait_files, input_mask_path = mask_custom_path, output_dir = output_dir, window_size = window_size, Kmeans_info_save = Kmeans_info_save, Beta_info_save = Beta_info_save, nbclusters = nbclusters, alphametrics = c("richness", "shannon", "simpson"), FDmetric = c("FRic", "FEve", "FDiv", "FDis", "FRaoq"), maxRows = maxRows, nbCPU = nbCPU, progressbar = T, filetype = "GTiff" )
However, using the same script, the system does not use more than 1 core in a clean install of Windows 11 with the latest R version (or R 4.2.2) and latest package dependencies.
Has someone reported something similar? The same thing happens to a colleague of mine on Windows 11.
Best regards,
dear @jofeggg
thank you for raising this issue.
indeed, this is not a bug but this is something I should modify or at least document.
I assume you process a Sentinel-2 image or similar and set maxRows <- 10000 to match the number of lines of your image?
maxRows corresponds to the maximum number or rows to be processed at once by a CPU/thread. if user defines maxRows >= number of rows of the image, then only one processor with be used. and the process may be very RAM intensive.
You should either not define maxRows (then default value is 20*window_size), or define maxRows as nb_lines/nbCPU, if nb_lines is the number of lines of your image. This should work well if your raster does not include uneven masking and if you are confident with the RAM available on your computer. otherwise I suggest you do not define maxRows, the multithread computation should then smooth the uneven distribution of your data.
maxRows is currently defined as
#' @param maxRows numeric. max number of rows in each block
but it should be defined as in a next update.
#' @param maxRows numeric. max number of rows processed once by each CPU
let me know if you think this is a better parameter definition or if you have suggestions
cheers, jb
Dear @jbferet
Thanks for your fast reply!
I didn’t set the maxRows parameter to 10,000 based on the size of the S2 image, but because it sped up processing significantly on a Windows 10 machine with an Intel i7 and 128 GB of RAM. On that PC, parallel processing with nbCPU = total_cores – 1 worked correctly and the CPU was almost at 100%.
However, on a Windows 11 machine with an Intel i9 and 192 GB of RAM, using the same configuration, it only uses one core (almost 7% CPU load)—so I thought it might be related to Windows 11 or the newer versions of R.
This afternoon I’ll try the configuration you suggested to see if it fixes the issue.
All the best,
Right, I forgot parallel processing worked fine with windows 10 OS. So you confirm that besides windows OS version, your R environment is identical (R version, biodivMapR version, imported packages version and so on) ? The behavior you describe in windows 11 OS seems logical based on the latest version of the code: maxRows has a 'priority' over nbCPU. This should not be the case, in my opinion (you want to take advantage of available CPU if you defined a number of CPU to use, and balance computation over CPUs even if large amount of RAM available).
If the biodivMapR version was the exact same for your windows 10 test, I currently do not have an explanation for it. I will try to investigate. Meanwhile, can you test with maxRows set to nb_lines/nbCPU, or maxRows <- NULL and tell me if parallel processing works properly and computation time is comparable to windows 10 process?
jb
On the Windows 11 PC I just tested:
-
Latest version of R and all packages, including biodivMapR 2.3.4, with maxRows <- 10000: it shows 0% load on all threads except one, and total CPU usage is 8%.
-
Latest version of R and all packages, including biodivMapR 2.3.4, with maxRows <- NULL: it shows 0% load on all threads except one, and even lower CPU usage than in the previous case, as expected.
-
Same version of R (4.2.2) as on the Windows 10 PC and the same package versions, including biodivMapR 2.3.4 (on Windows 10 I was already using the latest version), for both maxRows options, and the result is identical.
With this test code I found online, parallelization does work:
library(parallel) nb_cores <- detectCores() cl <- makeCluster(nb_cores - 1) res <- parLapply(cl, 1:1e8, (x) 1)
All the best,
Thanks for your support, @jbferet ,
I’ve made some interesting discoveries:
After updating the BIOS of my motherboard and Intel drivers, parallel processing started working correctly on Windows 11. The mysteries of computing... With the same R versions and configurations, performance is similar between Windows 10 and Windows 11. However, after several tests, I’ve discovered some interesting things (see below).
-
In my case, both in Windows 10 and Windows 11, the performance of biodivMapR is significantly better in R 4.2.2 than in R 4.5.0, given identical hardware and processing options (maxRows and nbCPU). To compute alpha, beta and functional diversity metrics from four functional traits for a 60,000 ha area on an i9 with 192 GB of RAM, R 4.2.2 takes about 5 minutes, while R 4.5.0 takes 15 minutes. That said, the difference may also be related to the versions of the libraries on which biodivMapR depends.
-
Here’s the interesting part. As you rightly pointed out, @jbferet , the option maxRows = NULL gives the best performance, combined with a limited number of processor threads. The best performance for me is achieved using 4 to 8 threads in a i9. Beyond that, performance drops sharply. This also applies to a smaller scene of 5,000 ha. With nbCPU = 4–8, CPU load remains around 15% on a single core during the initial steps up until the PCoA is performed. When nbCPU > 8, CPU load does not exceed 8%, and efficiency decreases. During the PCoA step, when using 4–8 threads, parallel processing is triggered more frequently, whereas with more than 8 threads, the time intervals between full parallel processing become longer, which increases the overall processing time.
I hope this is helpful.
Thank you very much for your help.
All the best,
thank you for your feedbacks. The parallel processing could be improved in biodivMapR, and the balance between the capacity to manage very large dataset and the efficency for running reasonably large dataset is sometimes complex! Thank you for the time you took to push further. I may get rid of the maxRows variable in a next update. The degraded performances when using more than 8 CPU may be caused by the size of input & outputs during process, or something more related to the hard/soft part of your machine (as experienced with your drivers). I may not have the skills to fix that. some parts of the package still undocumented provide alternative ways to process images, and should (I think) handle parallel processing much more efficiently. If you are interested, I can share some example scripts with you. Contact me via email, and we can discuss on how to adapt the scripts to your study case. I will try to update the documentation as soon as I have time.
Cheers, jb
Yes, I believe the maxRows parameter could be hidden. The degraded performance with many threads is likely related to the input size and its interaction with maxRows, since other multi-threaded processes using different packages (e.g. machine learning) now make efficient use of all cores with all updated in my hardware side.
I’d love for you to share those scripts with me. I’ll contact you via email.
All the best,