remotePARTS icon indicating copy to clipboard operation
remotePARTS copied to clipboard

High core utilization during CRAN submission.

Open morrowcj opened this issue 1 year ago • 12 comments

After my second attempt to submit to CRAN, we were flagged for the following note:

* checking examples ... [49s/6s] NOTE
Examples with CPU (user + system) or elapsed time > 5s
            user system elapsed
partGLS 43.138  1.395   2.815
Examples with CPU time > 2.5 times elapsed time
            user system elapsed ratio
partGLS 43.138  1.395   2.815 15.82

The offending function must be fitGLS_partition(), but I have been unable to replicate CRAN's results on any of my test machines. If others could run the following code with the newest version of the package, it might provide some insight.

library(remotePARTS)
data(ndvi_AK10000)
df = ndvi_AK10000[seq_len(1000), ]
pm = sample_partitions(nrow(df), npart = 3)

proc_ratio <- function(expr){
  tab = system.time(expr)
  ratio = (tab["user.self"] + tab["sys.self"]) / tab["elapsed"]
  return(ratio)
}

test_1 <- proc_ratio(fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, 
                                      nugget = 0))
test_2 <- proc_ratio(fitGLS_partition(formula = CLS_coef ~ lat, partmat = pm, data = df, 
                                      nugget = 0))
test_3 <- proc_ratio(fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, 
                                      nugget = NA))
test_4 <- proc_ratio(multicore_fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, 
                                                data = df, nugget = 0, ncores = 2))
test_5 <- proc_ratio(fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, 
                                      nugget = 0, ncores = 2, parallel = TRUE))
test_6 <- proc_ratio(fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm, data = df, 
                                      nugget = 0, ncores = 2, parallel = FALSE))

all_tests <- c(test_1, test_2, test_3, test_4, test_5, test_6)

all_tests
any(all_tests >= 2.5) # did any tests use more than 2 cores?

morrowcj avatar Sep 11 '23 16:09 morrowcj

This may be related to those problems found by @kelewinska (#20).

A test to this effect should also be implemented (#5)

morrowcj avatar Sep 11 '23 16:09 morrowcj

I just re-installed [from GitHub]. I had no problem running the code, but got the warning:

The legacy packages maptools, rgdal, and rgeos, underpinning the sp package, which was just loaded, will retire in October 2023. Please refer to R-spatial evolution reports for details, especially https://r-spatial.org/r/2023/05/15/evolution4.html. It may be desirable to make the sf package available; package maintainers should consider adding sf to Suggests:. The sp package is now running under evolution status 2 (status 2 uses the sf package in place of rgdal)

The output was

all_tests user.self user.self user.self user.self user.self user.self 0.9920683 1.0006812 1.0000000 0.1160185 0.1119403 0.9993056

any(all_tests >= 2.5) # did any tests use more than 2 cores? [1] FALSE

arives avatar Sep 11 '23 16:09 arives

I got this output: | | 0%The legacy packages maptools, rgdal, and rgeos, underpinning the sp package, which was just loaded, will retire in October 2023. Please refer to R-spatial evolution reports for details, especially https://r-spatial.org/r/2023/05/15/evolution4.html. It may be desirable to make the sf package available; package maintainers should consider adding sf to Suggests:. The sp package is now running under evolution status 2 (status 2 uses the sf package in place of rgdal) |======================================================================| 100% |======================================================================| 100% |======================================================================| 100% |======================================================================| 100% user.self user.self user.self user.self user.self user.self 8.18373299 12.14780876 11.41590013 0.54575342 0.03631961 9.96156689 [1] TRUE I am running it on Linux in the latest version of a docker container I put together 2 weeks ago (https://hub.docker.com/r/kelewinska/rparts). I also restricted the docker container to use only 20 cores (other things are running on the server).

kelewinska avatar Sep 11 '23 17:09 kelewinska

@arives , thanks for doing that!

Yes, I've been getting those warnings too. I'm pretty sure they are due to us using the package geosphere, which itself has those dependencies.

@kelewinska, that is very interesting. Which linux distro are you using? Can you see if you get the same results with the development version?

devtools::install_github("morrowcj/remotePARTS", ref = "develop")

morrowcj avatar Sep 11 '23 17:09 morrowcj

@morrowcj cat /etc/os-release command PRETTY_NAME="Debian GNU/Linux trixie/sid" NAME="Debian GNU/Linux" VERSION_CODENAME=trixie ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" cat: command: No such file or directory

Sure! I will put together a container for a developer version. Should have a moment later today.

kelewinska avatar Sep 11 '23 17:09 kelewinska

I just checked, geosphere depends upon sp. I expect the maintainers of that package will update it soon (hopefully).

morrowcj avatar Sep 11 '23 17:09 morrowcj

@morrowcj The same block of code exacted with developer version of remotePARTS also yields TRUE:

| | 0%The legacy packages maptools, rgdal, and rgeos, underpinning the sp package, which was just loaded, will retire in October 2023. Please refer to R-spatial evolution reports for details, especially https://r-spatial.org/r/2023/05/15/evolution4.html. It may be desirable to make the sf package available; package maintainers should consider adding sf to Suggests:. The sp package is now running under evolution status 2 (status 2 uses the sf package in place of rgdal) |======================================================================| 100% |======================================================================| 100% |======================================================================| 100% |======================================================================| 100% user.self user.self user.self user.self user.self user.self 5.01672640 7.35580264 9.00376176 0.73052363 0.03512133 10.20821326 [1] TRUE

kelewinska avatar Sep 11 '23 18:09 kelewinska

Thank you, @kelewinska! Your results actually tell us something interesting: both times, tests 4 and 5 both stayed under the limit. That means that the parallel=TRUE version (which calls multicore_fitGLS_partition()) works as expected. What I think it means is that, for some reason, the C++ code is ignoring the ncores argument on your machine - but I'm not sure why. It likely has something to do with C++ compilers.

Can you give me the results of the shell command gcc --version. I'll look into this more soon.

morrowcj avatar Sep 12 '23 02:09 morrowcj

Note to self: check into local Makevars and its different options.

morrowcj avatar Sep 12 '23 02:09 morrowcj

Sure thing @morrowcj ! in a docker container the compiler is as follows

gcc (Debian 13.2.0-2) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.

the server however runs the following gcc:

gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.

Since the container is an isolated instance, the server setup should not really matter.

I hope it helps. Beside the flags from CRAN, it could be that the docker container is incomplete or flawed. I have very little experience with it (yet) and settled on a version that was not throwing errors at me while compiling the vignette. So some of the issues could potentially arise from this.

I am happy to run more tests, just let me know :)

kelewinska avatar Sep 12 '23 06:09 kelewinska

@kelewinska, are you able to run the test on the server itself? If so, do you get the same results?

The next test I'd ask for is for you to create a ~/.R/Makevars file with the following contents (in docker if that's where you're running your test from).

PKG_LIBS=-fopenmp
PKG_CXXFLAGS=-fopenmp

That ensures that your C++ compiler (gcc) uses openMP when installing R packages, which is the library that the parallel functionality depends on. Then you'll need to reinstall the package before rerunning the tests.

If that doesn't work, I'm at a loss (for now).

morrowcj avatar Sep 12 '23 20:09 morrowcj

@morrowcj when i run the code block on a server I am also getting TRUE:

user.self   user.self   user.self   user.self   user.self   user.self 
32.72179813 37.05383637 33.51636504  0.03923648  0.03025866 23.94307073 
> any(all_tests >= 2.5) # did any tests use more than 2 cores?
[1] TRUE

I did a 'quick and dirty' test by tinkering with Makevars inside the container, and then removing and reinstalling remotePARTS by hand (it is not a 'sustainable' way to do it ;) ) and the results are again

 user.self  user.self  user.self  user.self  user.self  user.self
8.51681957 9.30970776 8.87753251 0.03681319 0.04181818 6.78893741
[1] TRUE

A side note: i tried to add the Makevars in to docker image blueprint, and upon this occasion discovered that during the past 2 weeks there were some updates and the base docker image is crushing upon build due to failing dependencies. I will look into it in the following days - maybe some updates will sort the main problem out?

kelewinska avatar Sep 13 '23 08:09 kelewinska