downscaleR icon indicating copy to clipboard operation
downscaleR copied to clipboard

Problems with downscaleCV function

Open rbalmaceda opened this issue 5 years ago • 7 comments

Hi, Im using the downscaleCV function to test some GLM models with folds that within them, the years are not consecutive, for example:

fold1: 1979 1985 1991 1997 2003 2009 fold2: 1980 1986 1992 1998 2004 2010 fold3: 1981 1987 1993 1999 2005 2011

This seems to be a problem for the function, since it gives the following error: "In the parameters folds you have indicated years that do not belong to the dataset. Please revise the setup of this parameter" This error is associated with putting sampling.strategy = "k fold.chronological" and the following part of the code:

if (sampling.strategy == "kfold.chronological") { type <- "chronological" if (!is.numeric(folds)) { folds.user <- unique(unlist(folds)) folds.data <- unique(getYearsAsINDEX(y)) if (any(folds.user != folds.data)) stop("In the parameters folds you have indicated years that do not belong to the dataset. Please revise the setup of this parameter.") }

I don't know if I should use k.fold.chronological in this case or if there is another way to perform the CV using folds with non-consecutive years. Thank u!

rbalmaceda avatar Jan 20 '20 01:01 rbalmaceda

Hi Rocio,

I have tested the non-consecutive years case with the example provided in the downscale.cv help (type ?downscale.cv in R to see the example). I changed the number of folds to the following while keeping |sampling.strategy == "kfold.chronological"|:

folds = list(c(1985,1987,1992),                   c(1989,1990,1986,1995),                   c(1988,1991,1993,1994))

... and for me it worked. It is true that the parameter "sampling.strategy" being equal to "kfold.chronological" when folds could be in fact non-chronological is misleading and changes concerning this matter will be solved in future releases. However, the splitting of the folds within the function is done correctly.

One reason you could have that error is maybe because you are not including in your folds, all your available years. Please verify if all the years in your grid are also in the folds. For example I see that the years 1982,1983,1984 are not in your fold list, among others. If these set of years do not match then the function returns the error you provided. If you only want to use a subset of years of your original grid, I recommend you to use subsetGrid of library transformeR as a previous step to downscaleCV.

Please keep me inform,

Jorge

El 20/1/20 a las 2:27, Rocio B escribió:

Hi, Im using the |downscaleCV| function to test some GLM models with folds that within them, the years are not consecutive, for example:

fold1: 1979 1985 1991 1997 2003 2009 fold2: 1980 1986 1992 1998 2004 2010 fold3: 1981 1987 1993 1999 2005 2011

This seems to be a problem for the function, since it gives the following error: "In the parameters folds you have indicated years that do not belong to the dataset. Please revise the setup of this parameter" This error is associated with putting sampling.strategy = "k fold.chronological" and the following part of the code:

|if (sampling.strategy == "kfold.chronological") { type <- "chronological" if (!is.numeric(folds)) { folds.user <- unique(unlist(folds)) folds.data <- unique(getYearsAsINDEX(y)) if (any(folds.user != folds.data)) stop("In the parameters folds you have indicated years that do not belong to the dataset. Please revise the setup of this parameter.") }|

I don't know if I should use k.fold.chronological in this case or if there is another way to perform the CV using folds with non-consecutive years. Thank u!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SantanderMetGroup/downscaleR/issues/68?email_source=notifications&email_token=AE4DRYK3AKG4EQRBHAWSZ6TQ6T4X7A5CNFSM4KI45BL2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHHFYXQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4DRYIYTUXAYG34S2YYX53Q6T4X7ANCNFSM4KI45BLQ.

jorgebanomedina avatar Jan 22 '20 15:01 jorgebanomedina

Hi Jorge, sorry for the late response. I still can't find the problem. Im using all the years available in my data. Could it be the version I am using?

downscaleR version 3.1.0 transformeR version 1.6.1 loadeR version 1.6.0

In downscaleCV, in the part that compares fold.user and fold.data, my years are the same but they are in different order, so the function assumes that some year is missing:

folds.user 1979 1985 1991 1997 2003 2009 1980 1986 1992 1998 2004 2010 1981 1987 1993 1999 2005 2011 1982 1988 1994 2000 2006 2012 1983 1989 1995 2001 2007 2013 1984 1990 1996 2002 2008 2014 folds.data 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

any(folds.user != folds.data) [1] TRUE

Does this not happen with your data? Thanks for all Rocio

rbalmaceda avatar Feb 03 '20 14:02 rbalmaceda

You are right, your issue was solved in a newer version but was not released yet. I have just now released it so please update downscaleR to version v3.1.1 and transformeR to v1.7.1. So basically install their latest releases.

By the way, I took the opportunity to also include the possibility of the parameter sampling.strategy to be NULL, which is more appropriate for your splitting of data. Therefore your call to downscaleCV would be like this:

 pred <- downscaleCV(x,y,sampling.strategy = NULL,                       method = "GLM",                       folds = list(your folds)

Hope this helps,

Jorge

El 3/2/20 a las 15:21, Rocio B escribió:

Hi Jorge, sorry for the late response. I still can't find the problem. Im using all the years available in my data. Could it be the version I am using?

downscaleR version 3.1.0 transformeR version 1.6.1 loadeR version 1.6.0

In downscaleCV, in the part that compares fold.user and fold.data, my years are the same but they are in different order, so the function assumes that some year is missing:

folds.user
1979 1985 1991 1997 2003 2009 1980 1986 1992 1998 2004 2010 1981
1987 1993 1999 2005
2011 1982 1988 1994 2000 2006 2012 1983 1989 1995 2001 2007 2013
1984 1990 1996 2002
2008 2014
folds.data
1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991
1992 1993 1994 1995
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
2009 2010 2011 2012
2013 2014

any(folds.user != folds.data) [1] TRUE

Does this not happen with your data? Thanks for all Rocio

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SantanderMetGroup/downscaleR/issues/68?email_source=notifications&email_token=AE4DRYPCSPMHJP7AB2FQO4DRBAR7PA5CNFSM4KI45BL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKUAMYY#issuecomment-581437027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4DRYONNAS2STXGZTRPMBTRBAR7PANCNFSM4KI45BLQ.

jorgebanomedina avatar Feb 05 '20 17:02 jorgebanomedina

Thanks Jorge, are these versions already available? I updated the packages (with @master) and the last version for me are transformeR 1.7.0 and downscaleR 3.1.0 Thanks again Rocio

rbalmaceda avatar Feb 06 '20 19:02 rbalmaceda

I just changed the version names. Try to install them again and you'll have them.

Thanks!

El 6/2/20 a las 20:15, Rocio B escribió:

Thanks Jorge, are these versions already available? I updated the packages (with @master https://github.com/master) and the last version for me are transformeR 1.7.0 and downscaleR 3.1.0 Thanks again Rocio

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SantanderMetGroup/downscaleR/issues/68?email_source=notifications&email_token=AE4DRYNBGQ6W5K4IMA7ICNDRBROT3A5CNFSM4KI45BL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELANXLQ#issuecomment-583064494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4DRYLOYQVVWSPPIXTCINDRBROT3ANCNFSM4KI45BLQ.

jorgebanomedina avatar Feb 07 '20 09:02 jorgebanomedina

Hi Jorge, I'm sorry to bother you again. I ran the new version and it went well. But I wanted to ask you, if it is possible that in the final product, in the Data (part of the downscaleCV list), the years are not assigned correctly by the function. As if the function at the end paste the years of the folds one behind the other and not in chronological order. I suspect this because I calculated the correlations between the observed data and my trained model and I get some strange results. I tried to reorder them, and I got more coherent values. I don't know if I'm right, but I ask you just in case. Thank you

rbalmaceda avatar Feb 10 '20 19:02 rbalmaceda

Ok Rocio, you were rightm there was a bug when ordering years that were not chronological. I think I solved it. Update your downscaleR version from the master branch and try if it works.

Thank you for pointing the bug!

El 10/2/20 a las 20:22, Rocio B escribió:

Hi Jorge, I'm sorry to bother you again. I ran the new version and it went well. But I wanted to ask you, if it is possible that in the final product, in the Data (part of the downscaleCV list), the years are not assigned correctly by the function. As if the function at the end paste the years of the folds one behind the other and not in chronological order. I suspect this because I calculated the correlations between the observed data and my trained model and I get some strange results. I tried to reorder them, and I got more coherent values. I don't know if I'm right, but I ask you just in case. Thank you

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SantanderMetGroup/downscaleR/issues/68?email_source=notifications&email_token=AE4DRYLPDPB3M7ER2BZGKJTRCGSOTA5CNFSM4KI45BL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELJ4WZQ#issuecomment-584305510, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4DRYJB4P64QJ7CNWNBCNDRCGSOTANCNFSM4KI45BLQ.

jorgebanomedina avatar Feb 11 '20 18:02 jorgebanomedina