downscaleR
downscaleR copied to clipboard
Problem with downscaleTrain when using method="NN"
Hello everyone! I'm performing a daily precipitation downscaling using method="NN" with station data. This data has less than 15% of missing data during the training period. When running downscaleTrain, the program gives the following message:
"65.87 % of observations contains NaN, removed from the training phase ..."
It seems that this percentage is an indicative of the number of days with at least one station with NaN, and those days are then removed from the training phase, which is really inconvenient for the modeling. However, when I replicate the experiment but with a different method (GLM or analogs), this message does not appear. Is there something wrong with the code or should I do something else? Thanks a lot, Matias
Hello Matias,
The reason to get rid of any day containing a NaN value (even when NaN only appear for one station) is that neural networks are a multi-site method where all stations are simultaneously trained in one single net. On the other hand, GLMs are single-site models and therefore these are built by getting rid of the NaN dates individually and not removing the joint non-NaN dates (case of the neural networks).
Therefore, if you want to avoid the inherent limitation of multi-site neural networks when working with a big amount of NaN data you should consider the idea of building single-site neural networks. This can be done by calling the prepareData function with a specific "local.predictors" configuration. This will be understood in downscaleTrain as a single-site method as predictors now differ among stations.
data <- prepareData(x = x, y = y,local.predictors = list(n=4, vars = getVarNames(x)))
model.nnets <- downscaleTrain(data, method = "NN", hidden = c(10,5), output = "linear")
Hope this helps,
Jorge
El 26/2/20 a las 18:43, matiaseolmo escribió:
Hello everyone! I'm performing a daily precipitation downscaling using method="NN" with station data. This data has less than 15% of missing data during the training period. When running downscaleTrain, the program gives the following message:
"65.87 % of observations contains NaN, removed from the training phase ..."
It seems that this percentage is an indicative of the number of days with at least one station with NaN, and those days are then removed from the training phase, which is really inconvenient for the modeling. However, when I replicate the experiment but with a different method (GLM or analogs), this message does not appear. Is there something wrong with the code or should I do something else? Thanks a lot, Matias
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SantanderMetGroup/downscaleR/issues/70?email_source=notifications&email_token=AE4DRYISRM57KHG7E4VJ63LRE2S55A5CNFSM4K4KU5KKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQRF7MA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4DRYKOA5BPUHU6RIAJQ73RE2S55ANCNFSM4K4KU5KA.
Thank you Jorge, your advice was really helpful! Regards, Matias