dl4ds icon indicating copy to clipboard operation
dl4ds copied to clipboard

Running an example with data_module

Open anlarro opened this issue 2 years ago • 5 comments

Hi, I was trying to run an example with my own data but when running app.py is asking me for a data_module flag. Reading the documentation and the code I couldn't find a reference on this. 'data_module flag must be provided (path to the data preprocessing module)'

Could you please provide more info about this, thank you!

anlarro avatar Jul 26 '22 11:07 anlarro

Hi, thanks for your interest in DL4DS. The documentation needs some work and the creation of a tutorial is WIP. My suggestion would be to call dl4ds.SupervisedTrainer or dl4ds.CGANTrainer directly in your script while passing your (preprocessed) data variables. Have you tried this?

Bear in mind that the app.py module is very experimental and is what I used to run my experiments in a cluster with a workflow manager. The data_module is just a python script were you run your pre-processing steps (e.g., slicing data, splitting, normalizing/standardizing) and some variables are declared. These variables are called in app.py, e.g., DATA.data_train or DATA.predictors_train, when feeding the training or inference steps.

carlos-gg avatar Jul 27 '22 12:07 carlos-gg

Thank you for your quick response. I'm calling dl4ds.SupervisedTrainer using only data_train, data_val, data_test. But when executing trainer.run() I get: Unexpected result of train_function (Empty logs). Please use Model.compile(..., run_eagerly=True), or tf.config.run_functions_eagerly(True) for more information of where went wrong, or file a issue/bug to tf.keras.

For what I have found, this error may be because of wrong input data shape. My input data are xr.DataArray with shape [time, latitude, longitude, 1].

anlarro avatar Jul 27 '22 15:07 anlarro

The error doesn't tell me much so I'm not sure it's even related to the data (shape, format). Please provide more information about how you call the trainer and the full error.

carlos-gg avatar Jul 29 '22 09:07 carlos-gg

Hi Carlos, thank you for taking care of this. Indeed, the error didn't tell much but I figured out that the problem was with the batch size, so by setting a lower batch size I was able to train a model.

I have another doubt, do all the LR data should be at the same resolution? I mean, data_train_lr, predictors_train, and static_vars should be all at the same resolution or can I have different resolutions for train_lr and static_vars for example?

anlarro avatar Sep 14 '22 10:09 anlarro

Hi Andrés, I'm glad you've found the issue there. batch_size is a tricky hyperparameter to set as it depends on many factors, such as the size of the model, the available GPU/CPU memory, the size/dimensionality of the training samples, etc. So it's very case dependant.

To answer your question: the parameters data_train_lr, data_val_lr and data_test_lr require low/coarse resolution data. predictors_train is for inputing time-varying predictors and they can come in high or intermediate resolution (DL4DS will internally interpolate/resize the arrays when needed). static_vars on the other hand, must be high-resolution variables, such as elevation/topography. So yes, you can have different resolutions data_train_lr and static_vars.

carlos-gg avatar Sep 15 '22 08:09 carlos-gg