downscaleR icon indicating copy to clipboard operation
downscaleR copied to clipboard

downscaleR one process use too many CPU and TOO SLOW, WHY?

Open louwangzhiyuwhy opened this issue 3 years ago • 2 comments

options(java.parameters = "-Xmx8g")

library(climate4R.UDG) library(loadeR) library(loadeR.2nc) library(transformeR) library(climate4R.datasets) library(downscaleR) library(visualizeR) library(VALUE) library(climate4R.value)

vars <- c("var151","var165","var166") #psl; uas; vas varp <- c("var131@85000","var132@85000","var129@50000") #131-ua; 132-va; 130-ta; 129-zg; grid.list <- lapply(vars, function(x) { loadGridData(dataset = "/home/inspur/working/climate4r/ERA-I/box_surface_interim_1979_2018.nc", var = x, years = 1990:2018) } ) grid.listp <- lapply(varp, function(x) { loadGridData(dataset = "/home/inspur/working/climate4r/ERA-I/box_pressure_interim_1979_2018.nc", var = x, years = 1990:2018) } ) pred <- downscaleCV(xs, wsobs, folds = 3, sampling.strategy = "kfold.chronological", scaleGrid.args = list(type = "standardize"), method = "GLM", prepareData.args = list( "spatial.predictors" = list(which.combine = getVarNames(xs), v.exp = 0.9)))

It is very shocking that the downscaleCV method uses 12603% of one CPU and TOO SLOW why? A 30*40 box of ERA-I dataset was used to the downscaling dataset is small enough why take so many resources??? here is the cenos7 top result: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 553757 inspur 20 0 105.6g 74.1g 28200 R 12603 7.4 440:22.97 R

louwangzhiyuwhy avatar Oct 15 '21 09:10 louwangzhiyuwhy

Hi,

Could you please share with us the dimensions of the 'xs$Data' and 'wsobs$Data'? We suggest setting model.verbose = FALSE for saving memory space, when using a GLM (type ?glm.train in the R console). This can be included in downscaleCV as an additional argument to the function: downscaleCV(...,model.verbose = FALSE)

Please let is know if this improves the speed of the calculus,

Cheers,

Jorge

jorgebanomedina avatar Oct 15 '21 14:10 jorgebanomedina

In my code the downscaleCV runs extreamly slow.

HERE IS MY CODE:

options(java.parameters = "-Xmx8g")

library(climate4R.UDG)

library(loadeR)

library(loadeR.2nc)

library(transformeR)

library(climate4R.datasets)

library(downscaleR)

library(visualizeR)

library(VALUE)

library(climate4R.value)

vars <- c("var151","var165","var166") #psl; uas; vas

varp <- @.@.@.***") #131-ua; 132-va; 130-ta; 129-zg;

grid.list <- lapply(vars, function(x) {

loadGridData(dataset =

"/home/inspur/working/climate4r/ERA-I/box_surface_interim_1979_2018.nc",

var = x,

years = 1990:2018)

}

)

grid.listp <- lapply(varp, function(x) {

loadGridData(dataset =

"/home/inspur/working/climate4r/ERA-I/box_pressure_interim_1979_2018.nc",

var = x,

years = 1990:2018)

}

)

xs <- makeMultiGrid(grid.list)

xp <- makeMultiGrid(grid.listp)

wsobs <- loadGridData(dataset = "/home/inspur/working/climate4r/CCMP/box_CCMP_1990_2018_ws.nc", var = "ws")

pred <- downscaleCV(xs, wsobs, folds = 3, sampling.strategy = "kfold.chronological",

                 scaleGrid.args = list(type = "standardize"),

                 method = "GLM",

                 ncores = 20,

                 prepareData.args = list(

                 "spatial.predictors" = list(which.combine = getVarNames(xs), v.exp = 0.9)))

pred.p <- downscaleCV(xp, wsobs, folds = 3, sampling.strategy = "kfold.chronological",

                 scaleGrid.args = list(type = "standardize"),

                 method = "GLM",

                 ncores = 20,

                 prepareData.args = list(

                 "spatial.predictors" = list(which.combine = getVarNames(xp), v.exp = 0.9)))

To speed up I added argument: ncores = 20, but seems useless. DownscaleCV function will take me 3 hours or more;

HERE is my data structure:

str(xs)

List of 4

$ Variable:List of 2

..$ varName: chr [1:3] "var151" "var165" "var166"

..$ level : logi [1:3] NA NA NA

..- attr(*, "use_dictionary")= chr [1:3] "FALSE" "FALSE" "FALSE"

..- attr(*, "units")= chr [1:3] "" "" ""

..- attr(*, "longname")= chr [1:3] "var151" "var165" "var166"

..- attr(*, "daily_agg_cellfun")= chr [1:3] "none" "none" "none"

..- attr(*, "monthly_agg_cellfun")= chr [1:3] "none" "none" "none"

..- attr(*, "verification_time")= chr [1:3] "none" "none" "none"

$ Data : num [1:3, 1, 1:42368, 1:53, 1:40] 1.01e+05 -4.18e-01 1.86 1.01e+05 -1.46 ...

..- attr(*, "dimensions")= chr [1:5] "var" "member" "time" "lat" ...

$ xyCoords:List of 2

..$ x: num [1:40] 100 101 102 103 104 ...

..$ y: num [1:53] 10.5 11.2 12 12.8 13.5 ...

..- attr(*, "projection")= chr "LatLonProjection"

..- attr(*, "resX")= num 0.75

..- attr(*, "resY")= num 0.75

$ Dates :List of 3

..$ :List of 2

.. ..$ start: chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

.. ..$ end : chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

..$ :List of 2

.. ..$ start: chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

.. ..$ end : chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

..$ :List of 2

.. ..$ start: chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

.. ..$ end : chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

  • attr(*, "dataset")= chr "/home/inspur/working/climate4r/ERA-I/box_surface_interim_1979_2018.nc"

  • attr(*, "R_package_desc")= chr "loadeR-v1.7.0"

  • attr(*, "R_package_URL")= chr "https://github.com/SantanderMetGroup/loadeR"

  • attr(*, "R_package_ref")= chr https://doi.org/10.1016/j.envsoft.2018.09.009

str(wsobs)

List of 4

$ Variable:List of 2

..$ varName: chr "ws"

..$ level : NULL

..- attr(*, "use_dictionary")= logi FALSE

..- attr(*, "units")= chr ""

..- attr(*, "longname")= chr "ws"

..- attr(*, "daily_agg_cellfun")= chr "none"

..- attr(*, "monthly_agg_cellfun")= chr "none"

..- attr(*, "verification_time")= chr "none"

$ Data : num [1:42368, 1:160, 1:120] 4.01 3.01 4.6 4.91 6.64 ...

..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"

$ xyCoords:List of 2

..$ x: num [1:120] 100 100 101 101 101 ...

..$ y: num [1:160] 10.1 10.4 10.6 10.9 11.1 ...

..- attr(*, "projection")= chr "LatLonProjection"

..- attr(*, "resX")= num 0.25

..- attr(*, "resY")= num 0.25

$ Dates :List of 2

..$ start: chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

..$ end : chr [1:42368] "1990-01-01 00:00:00 GMT" "1990-01-01 06:00:00 GMT" "1990-01-01 12:00:00 GMT" "1990-01-01 18:00:00 GMT" ...

  • attr(*, "dataset")= chr "/home/inspur/working/climate4r/CCMP/box_CCMP_1990_2018_ws.nc"

  • attr(*, "R_package_desc")= chr "loadeR-v1.7.0"

  • attr(*, "R_package_URL")= chr "https://github.com/SantanderMetGroup/loadeR"

  • attr(*, "R_package_ref")= chr https://doi.org/10.1016/j.envsoft.2018.09.009

louwangzhiyuwhy avatar Oct 15 '21 15:10 louwangzhiyuwhy