arcgis-python-api icon indicating copy to clipboard operation
arcgis-python-api copied to clipboard

Tetouan power forecast new

Open moonlanderr opened this issue 1 year ago • 2 comments
trafficstars

<insert pull request description here>


Checklist

Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.

  • [ ] All imports are in the first cell?
    • [ ] First block of imports are standard libraries
    • [ ] Second block are 3rd party libraries
    • [ ] Third block are all arcgis imports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.
  • [ ] All GIS object instantiations are one of the following?
    • gis = GIS()
    • gis = GIS('home') or gis = GIS('pro')
    • gis = GIS(profile="your_online_portal")
    • gis = GIS(profile="your_enterprise_portal")
  • [ ] If this notebook requires setup or teardown, did you add the appropriate code to ./misc/setup.py and/or ./misc/teardown.py?
  • [ ] If this notebook references any portal items that need to be staged on AGOL/Python API playground, did you coordinate with a Python API team member to stage the item the correct way with the api_data_owner user?
  • [ ] If the notebook requires working with local data (such as CSV, FGDB, SHP, Raster files), upload the files as items to the Geosaurus Online Org using api_data_owner account and change the notebook to first download and unpack the files.
  • [ ] Code simplified & split out across multiple cells, useful comments?
  • [ ] Consistent voice/tense/narrative style? Thoroughly checked for typos?
  • [ ] All images used like <img src="base64str_here"> instead of <img src="https://some.url">? All map widgets contain a static image preview? (Call mapview_inst.take_screenshot() to do so)
  • [ ] All file paths are constructed in an OS-agnostic fashion with os.path.join()? (Instead of r"\foo\bar", os.path.join(os.path.sep, "foo", "bar"), etc.)
  • [ ] Is your code formatted using Jupyter Black? You can use Jupyter Black to format your code in the notebook.
  • [ ] If this notebook showcases deep learning capabilities, please go through the following checklist:
    • [ ] Are the inputs required for Export Training Data Using Deep Learning tool published on geosaurus org (api data owner account) and added in the notebook using gis.content.get function?
    • [ ] Is training data zipped and published as Image Collection? Note: Whole folder is zipped with name same as the notebook name.
    • [ ] Are the inputs required for model inferencing published on geosaurus org (api data owner account) and added in the notebook using gis.content.get function? Note: This includes providing test raster and trained model.
    • [ ] Are the inferenced results displayed using a webmap widget?
  • [ ] IF YOU WANT THIS SAMPLE TO BE DISPLAYED ON THE DEVELOPERS.ARCGIS.COM WEBSITE, ping @jyaistMap so he can add it to the list for the next deploy.

moonlanderr avatar Feb 15 '24 10:02 moonlanderr

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BP-Ent Could you help in doc review of this?

priyankatuteja avatar Feb 27 '24 10:02 priyankatuteja

@BP-Ent , any update on this, we were targeting the notebook for this release , so if you could please review and suggest feedback it will be great!, thanks.

moonlanderr avatar Mar 05 '24 07:03 moonlanderr

I apologize @moonlanderr, I have been on medical leave! I'm going through my emails and issues and will look at this to see where I can fit it in before the 11.3 release.

BP-Ent avatar Mar 18 '24 20:03 BP-Ent

View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:08Z ----------------------------------------------------------------

In this notebook, we will forecast the power consumption of Tetouan city for one day in 10 minute increments using Deep Learning Series techniques. This short term time series forecasting can be crucial in optimizing grid operations, enhancing reliability, reducing costs, and facilitating the integration of renewable energy sources, and it can serve as a vital tool that will allow utilities to adapt to changing demand patterns and move towards a more sustainable future.

This process involves the use of advanced machine learning models to predict future electricity usage based on historical data. The process can be broken down into the following different methods and models: Multi-Step Multivariate Forecasting, One-Step univariate Forecasting, One-Step Multivariate Forecasting, the InceptionTIme specialized timeseries backbone, and the Bidirectional LSTM specialized timeseries backbones.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:09Z ----------------------------------------------------------------

The dataset employed in this illustrative study consists of a multivariate time series comprising power consumption data recorded every 10 minutes in Tetouan city. The data spans from January 2017 to December 2017, encompassing each day within that timeframe. The multivariate time series consists of historical power consumption, temperature, humidity, wind speed, and other relevant variables. The following cell downloads the data:


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:10Z ----------------------------------------------------------------

Once the data has been downloaded, we will first use one-step univariate forecasting. which we will use as a baseline for more complex forecasting models. In this approach, the model predicts one step ahead at a time. For this study of power consumption, this will mean predicting the usage for the next 10 minutes based solely on the historical values of that specific variable up to the current timestep. Thus, using past observations of the single variable Total Power Consumption, we will estimate future values for the given number of future timesteps. This method assumes that the future value depends only on the immediately preceding values of the same variable. This approach is relatively straightforward.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:11Z ----------------------------------------------------------------

Data preparation for a timeseries consists of first splitting the dataset into a training dataset and a testing dataset as follows:


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:12Z ----------------------------------------------------------------

When forecasting a single variable using only its past values, understanding its autocorrelation structure becomes crucial. Autocorrelation plots help visualize the relationship between a time series and its past values at various lags. This allows us to identify any significant autocorrelation patterns that can guide model selection and parameter tuning.

The autocorrelation plot below for the power consumption time series shows the correlation between the series and its lagged values at different time lags.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:13Z ----------------------------------------------------------------

Here we can see that the autocorrelation plot shows maximum autocorrelation at lag zero and gradually decreases over subsequent lags, which suggests a strong immediate dependence between consecutive observations, potentially indicating underlying seasonality or trend components in the data. This indicates the suitability of this data for univariate timeseries forecasting.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:14Z ----------------------------------------------------------------

Once the dataset has been divided into the training and testing datasets, we can use the training data for modelling.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:15Z ----------------------------------------------------------------

In this method, we are using a single variable named Total Power Consumption to forecast the 144 timesteps of future total power consumption or electricity usage for every 10 minutes based on its historical data, without using any explanatory variables.

The pre-processing of the data is carried out by the prepare_tabulardata method from the arcgis.learn module in the ArcGIS API for Python. This function will take either a non spatial dataframe, a feature layer, or a spatial dataframe containing the dataset as input and will return a TabularDataObject that can be fed into the model. Here we are using a non spatial dataframe.

The primary input parameters required for the tool are:

  • input_features : non spatial dataframe containing the primary dataset
  • variable_predict : field name Total Power Consumption as the y-variable to be forecasted from the input dataframe
  • explanatory_variables : Since there are none in this example, it is not required here.
  • index_field : field name containing the timestamp

Here, the preprocessor is used for scaling the data to improve the fit of the model.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:16Z ----------------------------------------------------------------

Next, the sequence length of 288 is used, as it is the previous two days of power consumption data. This is an important parameter for fitting a timeseries forecasting model and usually indicates the seasonality of the data, which can be experimented with for a better fit.

Using this sequence length, we can use the show_batch() function for visualization. The graph below depicts the segmentation of univariate time series data into batches, where each batch aligns with the specified sequence length designated for the model. The x-axis delineates the data, organized in batches, with each ticker at a 6-day interval. The y-axis represents the absolute values of power consumption. Notably, the value on the top of the graph signifies the target variable for forecasting, denoting the value after the end of the respective sequence lengths. This value serves as the dependent variable during the training of the time series model.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:17Z ----------------------------------------------------------------

This is the most significant step for fitting a timeseries model. Here, along with the pre-procesed data, the backbone for training the model and the sequence length are passed as parameters. Out of these three, the sequence length must be selected carefully, since it is a critical parameter. The sequence length is usually the cycle of the data. You can try with higher sequence lengths if there are sufficient computing resources available.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:18Z ----------------------------------------------------------------

In terms of backbones, the available set of backbones encompasses various architectures specialized for handling time series data. These include models specifically designed for time series (InceptionTime, TimeSeriesTransformer), recurrent neural networks like LSTM and Bidirectional LSTM, Neural network (FCN), and adaptations of convolutional neural networks (ResNet, ResCNN) for effective time series analysis.

Here we will use the LSTM Bidirectional model.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:19Z ----------------------------------------------------------------

Finally, the model is now ready for training. To train the model, the model.fit function is called and provided with the number of epochs for training and the estimated learning rate suggested by lr_find in the previous step. We will use 2 epochs for training, as it was found that 2 epochs is sufficient for the model to converge due to the high quality of the data, the large size of the dataset, and good seasonality in the data. In other cases, we might have to train further and use more epochs:


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:20Z ----------------------------------------------------------------

Next, we will use show result to compare the actual vs the forecasted value to understand the performance of the model. The value on the top of the left side of the graph signifies the actual target variable for forecasting, denoting the value after the end of the sequence length, whereas the value on the top of the corresponding right side graph signifies the forecasted value by the trained model. The x-axis delineates the data, organized in batches, with each ticker at 6-day interval, and the y-axis represents the normalized values of the power consumption variable. The plot reveals that the ground truths are close to the forecasted values, indicating a good fit. This is further validated by checking the model score.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:21Z ----------------------------------------------------------------

Multivariate forecasting involves using multiple time series variables (e.g., historical power consumption, temperature, humidity, etc.) to make predictions. This allows the model to capture more complex relationships and dependencies; however, it can also be more computationally intensive. Multi-Step forecasting methods involve predicting multiple future time steps at once. For instance, forecasting the power consumption for the next several timesteps simultaneously.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:22Z ----------------------------------------------------------------

Data preparation for timeseries consists of first splitting the dataset into a training dataset and a testing dataset as follows:


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:23Z ----------------------------------------------------------------

Once the dataset is divided into the training and testing datasets, the training data is ready to be used for modelling.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:24Z ----------------------------------------------------------------

Next we will be using the additional multivariates of Temperature, Humidity, Wind Speed, general diffuse flows, and diffuse flows, combined with related datetime information of month, weekday, and hour. Of these, month and weekday are used as categorical variables. As we did earlier, we will forecast the 144 timesteps of future total power consumption or electricity usage for every 10 minutes based on historical data, using these explanatory variables.

The preprocessing of the data is done by the prepare_tabulardata method from the arcgis.learn module in the ArcGIS API for Python.

Here, the additional parameter explanatory_variables will be used along with the parameters used earlier.

This function will take either a non spatial dataframe, a feature layer, or a spatial dataframe containing the dataset as input and will return a TabularDataObject that can be fed into the model. Here we are using a non spatial dataframe.

The additional input parameter required for the tool is:

explanatory_variables : We will pass the selected multiple variables in a list, along with declaring the relevant categorical variables     

The preprocessor is used for scaling the data, which usually improves the fit of the model.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:25Z ----------------------------------------------------------------

The sequence length is the used is 288. the same as earlier, which is the previous two days of power consumption data.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:26Z ----------------------------------------------------------------

As explained earlier, with this sequence length, we use the show_batch() function for visualization. However, for multivariate modeling, the show_batch function is currently experimental, so only the graph of the forecasting variable (blue) is appropriate, and you can ignore the explanatory variable graphs, which will be updated soon. Notably, the value on the top of the graph signifies the target variable for forecasting, denoting the value after the end of the respective sequence lengths. This value serves as the dependent variable during the training of the time series model.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:26Z ----------------------------------------------------------------

Along with the sequence length and model architecture parameters we used earlier, we will also pass the additional parameter of multistep=True for initializing the model. For the model architecture, we will use InceptionTime, which is a backbone specifically designed for time series.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:27Z ----------------------------------------------------------------

Finally, the model is now ready for training. To train the model, the model.fit method is called and provided with the number of epochs for training and the estimated learning rate suggested by lr_find in the previous step. As earlier, we will train it for two epochs.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:28Z ----------------------------------------------------------------

Next, show_result is used to visualize and compare the actual vs forecasted values to understand the model's performance. However, for multivariate modeling, the show_result function is currently experimental. Therefore, only the values displayed at the top of the graphs are appropriate, while the graphs themselves can be disregarded. They will be updated soon. The value on the top of the left column graphs signifies the actual target variable for forecasting, denoting the value after the end of the sequence length, whereas the value on the top of the corresponding right column graphs signifies the forecasted value by the trained model. The x-axis delineates the data, organized in batches, with each ticker at 6-day interval. The plot reveals that the ground truths are considerably close to the forecasted values, indicating a good fit. This is further validated by checking the model score.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:29Z ----------------------------------------------------------------

Once the model is trained, the predict function is used to forecast for a period of the next 144 time steps after the last recorded time steps in the training dataset. In cases of multi-step forecasting, we do not need to specify the number of future timesteps to forecast, and the predict function will automatically predict half of the sequence length used while preprocessing the data. The sequence length should be chosen with this in mind

Here, the model utilizes the same training dataset during the forecasting process and will use the last set of data points equivalent to the specified sequence length from the tail to predict future data points. So, it will forecast for the day of December 30th, at every 10 minutes of power consumption, starting on 00:00, 00:10, 00:20, etc., until 23:50 of the same day.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:30Z ----------------------------------------------------------------

The r-squared value has improved compared to the one step univariate method.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:31Z ----------------------------------------------------------------

Finally, we will try one more method of multivariate forecasting but with one step instead of multistep, to see if this method performs better than the multi-step forecasting, while using multiple variables.

This combines both the one-step and multivariate approaches, which involves predicting a single timestep in future in a time series, but instead of using just one variable's historical data, we will consider the multiple variables as used in the previous step. This means that it considers the past values of several different factors of temperature, humidity, wind speed etc., when making a single-step prediction. As suggested earlier, a multivariate approach is useful when there are multiple variables that may collectively influence the future value being predicted.


View / edit / reply to this conversation on ReviewNB

BP-Ent commented on 2024-03-20T18:23:32Z ----------------------------------------------------------------

Data preparation for timeseries consists of first formatting the input dataframe to be used for forecasting using a One step Multivariate model, followed by splitting the dataset into training and testing datasets.